afk: add the autonomous issue-implementer loop (SHIPS DISABLED)
Adds app/afk/ — the "away-from-keyboard" control plane that watches the
issue tracker for ready-for-agent issues, dispatches each to a fresh
full-access T3 thread (with the issue-implementer preamble prepended,
because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting
run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed,
escalating or fix-forwarding via a small pure state machine.
The loop is split into pure cores (no I/O, exhaustively unit-tested) and
thin injected adapters (the only edges that ever touch T3, the tracker,
CI, or Slack — faked in every test, so nothing here talks to a real
server, GitHub/Forgejo, or the cluster):
pure: types, dispatch_policy, run_state_machine, phase_checklist,
config, issue_implementer_prompt
adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher,
notifier
loops: poller — CronJob tick #1: list_ready -> select_dispatchable
-> dispatch + stamp the in-progress lock (label only
AFTER a successful dispatch, so a failed dispatch
never leaves a phantom lock). Per-repo lock derived
from the ready set, since the CronJob is stateless
between ticks.
watcher — CronJob tick #2: assemble RunState from snapshot +
CI -> next_action -> act (close on success; relabel
ready-for-human + ring the doorbell on the two
escalations; dispatch a corrective turn on
fix-forward; refresh the progress checklist).
SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an
empty allowlist, so a freshly-loaded config dispatches nothing and does
zero I/O. The package is not imported by the running service and has no
auto-enable path. Arming it is a deliberate, later, manual step requiring
BOTH gates (clear the kill switch AND enrol the exact repos) so one
fat-fingered env var can't arm every repo.
Test-first throughout: 412 tests pass (poller + watcher add integration
tests wiring the real pure cores to in-memory fakes). mypy clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 21:15:11 +00:00
|
|
|
"""Tests for ``app.afk.dispatch_policy.select_dispatchable`` — the pure gate that
|
|
|
|
|
turns a pile of ready issues into the ordered set the loop may dispatch *now*.
|
|
|
|
|
|
|
|
|
|
The function is PURE (no IO), so every test here is a plain in-memory call over
|
|
|
|
|
the fakes/factories in ``conftest`` (``make_issue`` / ``make_config``); nothing
|
|
|
|
|
touches a real T3 server, tracker, or cluster. The suite walks the full
|
|
|
|
|
dispatchability matrix — trust gate, allowlist, per-repo lock, blocked_by,
|
|
|
|
|
kill switch — plus the priority ordering and the one-agent-per-repo invariant.
|
|
|
|
|
|
afk: wire the T3 adapter to the REAL orchestration contract + fix priority
The T3 dispatch adapter was written against a guessed wire shape that the test
fake accepted but the live t3-afk server 400s — so the previously-green suite did
NOT mean the loop was actually wired to T3. Reverse-engineered the real contract
from the v0.0.27 binary, verified it live against t3-afk (including multi-turn),
and rewrote the adapter to match:
- dispatch sends BARE commands keyed by `type` (not a `command` string), with
client-minted threadId/commandId/messageId + createdAt; the server replies
{sequence}, so dispatch returns the id it generated (never one parsed back).
- a thread lives in a project (workspaceRoot = the repo checkout the agent runs
in), so dispatch ensures the repo's project (snapshot -> project.create iff
absent) before thread.create + thread.turn.start.
- add send_turn() for follow-up turns on an existing thread — multi-turn context
retention is verified live (turn 2 recalled turn 1).
- watcher reads thread liveness from latestTurn.state (completed->idle,
running/in_progress/pending->running, errored->error), not a non-existent
top-level `status` field.
Guard against recurrence: the test fake now REJECTS any command lacking a `type`
discriminator (the original bug fails loudly), plus an opt-in live smoke test
(tests/test_afk_t3_live.py) so "green" can mean "wired to T3".
Also align dispatch_policy to lower-priority-value-first (P0 before P1), matching
tracker conventions and Issue.priority's own docstring — it had deliberately
diverged to higher-first. Loop still ships DISABLED (kill switch on, empty
allowlist). 416 tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:27:00 +00:00
|
|
|
Ordering contract under test: **lower ``priority`` value first** (P0 before P1
|
|
|
|
|
before P2 — most urgent wins), matching tracker conventions and
|
|
|
|
|
``Issue.priority``'s own docstring, with a deterministic tiebreaker (ascending
|
|
|
|
|
issue number) so the output is stable regardless of input order.
|
afk: add the autonomous issue-implementer loop (SHIPS DISABLED)
Adds app/afk/ — the "away-from-keyboard" control plane that watches the
issue tracker for ready-for-agent issues, dispatches each to a fresh
full-access T3 thread (with the issue-implementer preamble prepended,
because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting
run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed,
escalating or fix-forwarding via a small pure state machine.
The loop is split into pure cores (no I/O, exhaustively unit-tested) and
thin injected adapters (the only edges that ever touch T3, the tracker,
CI, or Slack — faked in every test, so nothing here talks to a real
server, GitHub/Forgejo, or the cluster):
pure: types, dispatch_policy, run_state_machine, phase_checklist,
config, issue_implementer_prompt
adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher,
notifier
loops: poller — CronJob tick #1: list_ready -> select_dispatchable
-> dispatch + stamp the in-progress lock (label only
AFTER a successful dispatch, so a failed dispatch
never leaves a phantom lock). Per-repo lock derived
from the ready set, since the CronJob is stateless
between ticks.
watcher — CronJob tick #2: assemble RunState from snapshot +
CI -> next_action -> act (close on success; relabel
ready-for-human + ring the doorbell on the two
escalations; dispatch a corrective turn on
fix-forward; refresh the progress checklist).
SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an
empty allowlist, so a freshly-loaded config dispatches nothing and does
zero I/O. The package is not imported by the running service and has no
auto-enable path. Arming it is a deliberate, later, manual step requiring
BOTH gates (clear the kill switch AND enrol the exact repos) so one
fat-fingered env var can't arm every repo.
Test-first throughout: 412 tests pass (poller + watcher add integration
tests wiring the real pure cores to in-memory fakes). mypy clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 21:15:11 +00:00
|
|
|
"""
|
|
|
|
|
import itertools
|
|
|
|
|
|
|
|
|
|
import pytest
|
|
|
|
|
|
|
|
|
|
from app.afk import dispatch_policy
|
|
|
|
|
from app.afk.types import DispatchDecision, Issue
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
|
|
|
|
# Helpers — keep assertions terse and intent-revealing.
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
|
|
|
|
def _selected_numbers(decisions: list[DispatchDecision]) -> list[int]:
|
|
|
|
|
"""The issue numbers, in the order the policy returned them."""
|
|
|
|
|
return [d.issue.number for d in decisions]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _selected_set(decisions: list[DispatchDecision]) -> set[int]:
|
|
|
|
|
return {d.issue.number for d in decisions}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
|
|
|
|
# Return shape & purity.
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
|
|
|
|
def test_returns_list_of_dispatch_decisions(make_issue, make_config):
|
|
|
|
|
issue = make_issue(number=7, repo="infra")
|
|
|
|
|
decisions = dispatch_policy.select_dispatchable([issue], make_config(), set())
|
|
|
|
|
assert isinstance(decisions, list)
|
|
|
|
|
assert len(decisions) == 1
|
|
|
|
|
assert isinstance(decisions[0], DispatchDecision)
|
|
|
|
|
assert decisions[0].issue is issue
|
|
|
|
|
assert isinstance(decisions[0].reason, str) and decisions[0].reason # non-empty
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def test_empty_input_yields_empty_output(make_config):
|
|
|
|
|
assert dispatch_policy.select_dispatchable([], make_config(), set()) == []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def test_does_not_mutate_inputs(make_issue, make_config):
|
|
|
|
|
issues = [make_issue(number=1, priority=0), make_issue(number=2, priority=9)]
|
|
|
|
|
issues_snapshot = list(issues)
|
|
|
|
|
config = make_config(allowlist=["infra"])
|
|
|
|
|
in_flight: set[str] = set()
|
|
|
|
|
|
|
|
|
|
dispatch_policy.select_dispatchable(issues, config, in_flight)
|
|
|
|
|
|
|
|
|
|
# Caller's list (and its order) and the lock set are left untouched.
|
|
|
|
|
assert issues == issues_snapshot
|
|
|
|
|
assert [i.number for i in issues] == [1, 2]
|
|
|
|
|
assert in_flight == set()
|
|
|
|
|
assert config.allowlist == ["infra"]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def test_decision_wraps_the_same_issue_object(make_issue, make_config):
|
|
|
|
|
issue = make_issue(number=42)
|
|
|
|
|
[decision] = dispatch_policy.select_dispatchable([issue], make_config(), set())
|
|
|
|
|
assert decision.issue is issue # identity, not a copy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
|
|
|
|
# Kill switch — highest-precedence short-circuit.
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
|
|
|
|
def test_kill_switch_returns_empty_even_with_perfect_issues(make_issue, make_config):
|
|
|
|
|
issues = [make_issue(number=n, repo="infra") for n in range(1, 6)]
|
|
|
|
|
config = make_config(allowlist=["infra"], kill_switch=True)
|
|
|
|
|
assert dispatch_policy.select_dispatchable(issues, config, set()) == []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def test_kill_switch_off_dispatches(make_issue, make_config):
|
|
|
|
|
issue = make_issue(repo="infra")
|
|
|
|
|
config = make_config(allowlist=["infra"], kill_switch=False)
|
|
|
|
|
assert len(dispatch_policy.select_dispatchable([issue], config, set())) == 1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def test_production_default_config_dispatches_nothing(make_issue):
|
|
|
|
|
"""The shipped default (kill switch ON, empty allowlist) is inert: even a
|
|
|
|
|
pristine, trusted issue is never selected."""
|
|
|
|
|
from app.afk import config as afk_config
|
|
|
|
|
|
|
|
|
|
issue = make_issue(repo="infra")
|
|
|
|
|
assert dispatch_policy.select_dispatchable([issue], afk_config.default(), set()) == []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
|
|
|
|
# Trust gate.
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
|
|
|
|
def test_untrusted_issue_is_skipped(make_issue, make_config):
|
|
|
|
|
issue = make_issue(repo="infra", labeled_by_trusted=False)
|
|
|
|
|
assert dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set()) == []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def test_trusted_issue_is_eligible(make_issue, make_config):
|
|
|
|
|
issue = make_issue(repo="infra", labeled_by_trusted=True)
|
|
|
|
|
assert len(dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set())) == 1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def test_trust_gate_filters_only_untrusted(make_issue, make_config):
|
|
|
|
|
trusted = make_issue(number=1, repo="infra", labeled_by_trusted=True)
|
|
|
|
|
untrusted = make_issue(number=2, repo="infra", labeled_by_trusted=False)
|
|
|
|
|
decisions = dispatch_policy.select_dispatchable(
|
|
|
|
|
[trusted, untrusted], make_config(allowlist=["infra"]), set()
|
|
|
|
|
)
|
|
|
|
|
assert _selected_set(decisions) == {1}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
|
|
|
|
# Allowlist membership.
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
|
|
|
|
def test_repo_not_in_allowlist_is_skipped(make_issue, make_config):
|
|
|
|
|
issue = make_issue(repo="some-other-repo")
|
|
|
|
|
assert dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set()) == []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def test_empty_allowlist_dispatches_nothing(make_issue, make_config):
|
|
|
|
|
issue = make_issue(repo="infra")
|
|
|
|
|
# kill switch off but allowlist empty -> still inert (the two-gate posture).
|
|
|
|
|
config = make_config(allowlist=[], kill_switch=False)
|
|
|
|
|
assert dispatch_policy.select_dispatchable([issue], config, set()) == []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def test_allowlist_selects_only_listed_repos(make_issue, make_config):
|
|
|
|
|
a = make_issue(number=1, repo="infra")
|
|
|
|
|
b = make_issue(number=2, repo="realestate-crawler")
|
|
|
|
|
c = make_issue(number=3, repo="not-allowed")
|
|
|
|
|
decisions = dispatch_policy.select_dispatchable(
|
|
|
|
|
[a, b, c], make_config(allowlist=["infra", "realestate-crawler"]), set()
|
|
|
|
|
)
|
|
|
|
|
assert _selected_set(decisions) == {1, 2}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
|
|
|
|
# Per-repo lock (in_flight_repos).
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
|
|
|
|
def test_repo_already_in_flight_is_skipped(make_issue, make_config):
|
|
|
|
|
issue = make_issue(repo="infra")
|
|
|
|
|
decisions = dispatch_policy.select_dispatchable(
|
|
|
|
|
[issue], make_config(allowlist=["infra"]), in_flight_repos={"infra"}
|
|
|
|
|
)
|
|
|
|
|
assert decisions == []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def test_in_flight_lock_is_per_repo(make_issue, make_config):
|
|
|
|
|
locked = make_issue(number=1, repo="infra")
|
|
|
|
|
free = make_issue(number=2, repo="realestate-crawler")
|
|
|
|
|
decisions = dispatch_policy.select_dispatchable(
|
|
|
|
|
[locked, free],
|
|
|
|
|
make_config(allowlist=["infra", "realestate-crawler"]),
|
|
|
|
|
in_flight_repos={"infra"},
|
|
|
|
|
)
|
|
|
|
|
assert _selected_set(decisions) == {2} # only the unlocked repo's issue runs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def test_all_repos_in_flight_dispatches_nothing(make_issue, make_config):
|
|
|
|
|
a = make_issue(number=1, repo="infra")
|
|
|
|
|
b = make_issue(number=2, repo="realestate-crawler")
|
|
|
|
|
decisions = dispatch_policy.select_dispatchable(
|
|
|
|
|
[a, b],
|
|
|
|
|
make_config(allowlist=["infra", "realestate-crawler"]),
|
|
|
|
|
in_flight_repos={"infra", "realestate-crawler"},
|
|
|
|
|
)
|
|
|
|
|
assert decisions == []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
|
|
|
|
# One-agent-per-repo invariant — at most ONE decision per repo per call.
|
|
|
|
|
#
|
|
|
|
|
# The whole design serialises agents within a repo (two would collide on the
|
|
|
|
|
# working tree). A single call must therefore never hand back two issues for the
|
|
|
|
|
# same repo, even when both are eligible and the repo is not yet in-flight.
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
|
|
|
|
def test_at_most_one_decision_per_repo(make_issue, make_config):
|
afk: wire the T3 adapter to the REAL orchestration contract + fix priority
The T3 dispatch adapter was written against a guessed wire shape that the test
fake accepted but the live t3-afk server 400s — so the previously-green suite did
NOT mean the loop was actually wired to T3. Reverse-engineered the real contract
from the v0.0.27 binary, verified it live against t3-afk (including multi-turn),
and rewrote the adapter to match:
- dispatch sends BARE commands keyed by `type` (not a `command` string), with
client-minted threadId/commandId/messageId + createdAt; the server replies
{sequence}, so dispatch returns the id it generated (never one parsed back).
- a thread lives in a project (workspaceRoot = the repo checkout the agent runs
in), so dispatch ensures the repo's project (snapshot -> project.create iff
absent) before thread.create + thread.turn.start.
- add send_turn() for follow-up turns on an existing thread — multi-turn context
retention is verified live (turn 2 recalled turn 1).
- watcher reads thread liveness from latestTurn.state (completed->idle,
running/in_progress/pending->running, errored->error), not a non-existent
top-level `status` field.
Guard against recurrence: the test fake now REJECTS any command lacking a `type`
discriminator (the original bug fails loudly), plus an opt-in live smoke test
(tests/test_afk_t3_live.py) so "green" can mean "wired to T3".
Also align dispatch_policy to lower-priority-value-first (P0 before P1), matching
tracker conventions and Issue.priority's own docstring — it had deliberately
diverged to higher-first. Loop still ships DISABLED (kill switch on, empty
allowlist). 416 tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:27:00 +00:00
|
|
|
urgent = make_issue(number=1, repo="infra", priority=1)
|
|
|
|
|
minor = make_issue(number=2, repo="infra", priority=9)
|
afk: add the autonomous issue-implementer loop (SHIPS DISABLED)
Adds app/afk/ — the "away-from-keyboard" control plane that watches the
issue tracker for ready-for-agent issues, dispatches each to a fresh
full-access T3 thread (with the issue-implementer preamble prepended,
because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting
run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed,
escalating or fix-forwarding via a small pure state machine.
The loop is split into pure cores (no I/O, exhaustively unit-tested) and
thin injected adapters (the only edges that ever touch T3, the tracker,
CI, or Slack — faked in every test, so nothing here talks to a real
server, GitHub/Forgejo, or the cluster):
pure: types, dispatch_policy, run_state_machine, phase_checklist,
config, issue_implementer_prompt
adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher,
notifier
loops: poller — CronJob tick #1: list_ready -> select_dispatchable
-> dispatch + stamp the in-progress lock (label only
AFTER a successful dispatch, so a failed dispatch
never leaves a phantom lock). Per-repo lock derived
from the ready set, since the CronJob is stateless
between ticks.
watcher — CronJob tick #2: assemble RunState from snapshot +
CI -> next_action -> act (close on success; relabel
ready-for-human + ring the doorbell on the two
escalations; dispatch a corrective turn on
fix-forward; refresh the progress checklist).
SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an
empty allowlist, so a freshly-loaded config dispatches nothing and does
zero I/O. The package is not imported by the running service and has no
auto-enable path. Arming it is a deliberate, later, manual step requiring
BOTH gates (clear the kill switch AND enrol the exact repos) so one
fat-fingered env var can't arm every repo.
Test-first throughout: 412 tests pass (poller + watcher add integration
tests wiring the real pure cores to in-memory fakes). mypy clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 21:15:11 +00:00
|
|
|
decisions = dispatch_policy.select_dispatchable(
|
afk: wire the T3 adapter to the REAL orchestration contract + fix priority
The T3 dispatch adapter was written against a guessed wire shape that the test
fake accepted but the live t3-afk server 400s — so the previously-green suite did
NOT mean the loop was actually wired to T3. Reverse-engineered the real contract
from the v0.0.27 binary, verified it live against t3-afk (including multi-turn),
and rewrote the adapter to match:
- dispatch sends BARE commands keyed by `type` (not a `command` string), with
client-minted threadId/commandId/messageId + createdAt; the server replies
{sequence}, so dispatch returns the id it generated (never one parsed back).
- a thread lives in a project (workspaceRoot = the repo checkout the agent runs
in), so dispatch ensures the repo's project (snapshot -> project.create iff
absent) before thread.create + thread.turn.start.
- add send_turn() for follow-up turns on an existing thread — multi-turn context
retention is verified live (turn 2 recalled turn 1).
- watcher reads thread liveness from latestTurn.state (completed->idle,
running/in_progress/pending->running, errored->error), not a non-existent
top-level `status` field.
Guard against recurrence: the test fake now REJECTS any command lacking a `type`
discriminator (the original bug fails loudly), plus an opt-in live smoke test
(tests/test_afk_t3_live.py) so "green" can mean "wired to T3".
Also align dispatch_policy to lower-priority-value-first (P0 before P1), matching
tracker conventions and Issue.priority's own docstring — it had deliberately
diverged to higher-first. Loop still ships DISABLED (kill switch on, empty
allowlist). 416 tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:27:00 +00:00
|
|
|
[urgent, minor], make_config(allowlist=["infra"]), set()
|
afk: add the autonomous issue-implementer loop (SHIPS DISABLED)
Adds app/afk/ — the "away-from-keyboard" control plane that watches the
issue tracker for ready-for-agent issues, dispatches each to a fresh
full-access T3 thread (with the issue-implementer preamble prepended,
because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting
run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed,
escalating or fix-forwarding via a small pure state machine.
The loop is split into pure cores (no I/O, exhaustively unit-tested) and
thin injected adapters (the only edges that ever touch T3, the tracker,
CI, or Slack — faked in every test, so nothing here talks to a real
server, GitHub/Forgejo, or the cluster):
pure: types, dispatch_policy, run_state_machine, phase_checklist,
config, issue_implementer_prompt
adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher,
notifier
loops: poller — CronJob tick #1: list_ready -> select_dispatchable
-> dispatch + stamp the in-progress lock (label only
AFTER a successful dispatch, so a failed dispatch
never leaves a phantom lock). Per-repo lock derived
from the ready set, since the CronJob is stateless
between ticks.
watcher — CronJob tick #2: assemble RunState from snapshot +
CI -> next_action -> act (close on success; relabel
ready-for-human + ring the doorbell on the two
escalations; dispatch a corrective turn on
fix-forward; refresh the progress checklist).
SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an
empty allowlist, so a freshly-loaded config dispatches nothing and does
zero I/O. The package is not imported by the running service and has no
auto-enable path. Arming it is a deliberate, later, manual step requiring
BOTH gates (clear the kill switch AND enrol the exact repos) so one
fat-fingered env var can't arm every repo.
Test-first throughout: 412 tests pass (poller + watcher add integration
tests wiring the real pure cores to in-memory fakes). mypy clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 21:15:11 +00:00
|
|
|
)
|
|
|
|
|
assert len(decisions) == 1
|
afk: wire the T3 adapter to the REAL orchestration contract + fix priority
The T3 dispatch adapter was written against a guessed wire shape that the test
fake accepted but the live t3-afk server 400s — so the previously-green suite did
NOT mean the loop was actually wired to T3. Reverse-engineered the real contract
from the v0.0.27 binary, verified it live against t3-afk (including multi-turn),
and rewrote the adapter to match:
- dispatch sends BARE commands keyed by `type` (not a `command` string), with
client-minted threadId/commandId/messageId + createdAt; the server replies
{sequence}, so dispatch returns the id it generated (never one parsed back).
- a thread lives in a project (workspaceRoot = the repo checkout the agent runs
in), so dispatch ensures the repo's project (snapshot -> project.create iff
absent) before thread.create + thread.turn.start.
- add send_turn() for follow-up turns on an existing thread — multi-turn context
retention is verified live (turn 2 recalled turn 1).
- watcher reads thread liveness from latestTurn.state (completed->idle,
running/in_progress/pending->running, errored->error), not a non-existent
top-level `status` field.
Guard against recurrence: the test fake now REJECTS any command lacking a `type`
discriminator (the original bug fails loudly), plus an opt-in live smoke test
(tests/test_afk_t3_live.py) so "green" can mean "wired to T3".
Also align dispatch_policy to lower-priority-value-first (P0 before P1), matching
tracker conventions and Issue.priority's own docstring — it had deliberately
diverged to higher-first. Loop still ships DISABLED (kill switch on, empty
allowlist). 416 tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:27:00 +00:00
|
|
|
assert decisions[0].issue.number == 1 # most urgent (lowest value) wins the slot
|
afk: add the autonomous issue-implementer loop (SHIPS DISABLED)
Adds app/afk/ — the "away-from-keyboard" control plane that watches the
issue tracker for ready-for-agent issues, dispatches each to a fresh
full-access T3 thread (with the issue-implementer preamble prepended,
because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting
run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed,
escalating or fix-forwarding via a small pure state machine.
The loop is split into pure cores (no I/O, exhaustively unit-tested) and
thin injected adapters (the only edges that ever touch T3, the tracker,
CI, or Slack — faked in every test, so nothing here talks to a real
server, GitHub/Forgejo, or the cluster):
pure: types, dispatch_policy, run_state_machine, phase_checklist,
config, issue_implementer_prompt
adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher,
notifier
loops: poller — CronJob tick #1: list_ready -> select_dispatchable
-> dispatch + stamp the in-progress lock (label only
AFTER a successful dispatch, so a failed dispatch
never leaves a phantom lock). Per-repo lock derived
from the ready set, since the CronJob is stateless
between ticks.
watcher — CronJob tick #2: assemble RunState from snapshot +
CI -> next_action -> act (close on success; relabel
ready-for-human + ring the doorbell on the two
escalations; dispatch a corrective turn on
fix-forward; refresh the progress checklist).
SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an
empty allowlist, so a freshly-loaded config dispatches nothing and does
zero I/O. The package is not imported by the running service and has no
auto-enable path. Arming it is a deliberate, later, manual step requiring
BOTH gates (clear the kill switch AND enrol the exact repos) so one
fat-fingered env var can't arm every repo.
Test-first throughout: 412 tests pass (poller + watcher add integration
tests wiring the real pure cores to in-memory fakes). mypy clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 21:15:11 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
def test_one_decision_per_repo_across_many_repos(make_issue, make_config):
|
|
|
|
|
issues = [
|
|
|
|
|
make_issue(number=10, repo="infra", priority=1),
|
|
|
|
|
make_issue(number=11, repo="infra", priority=5),
|
|
|
|
|
make_issue(number=20, repo="realestate-crawler", priority=3),
|
|
|
|
|
make_issue(number=21, repo="realestate-crawler", priority=2),
|
|
|
|
|
]
|
|
|
|
|
decisions = dispatch_policy.select_dispatchable(
|
|
|
|
|
issues, make_config(allowlist=["infra", "realestate-crawler"]), set()
|
|
|
|
|
)
|
afk: wire the T3 adapter to the REAL orchestration contract + fix priority
The T3 dispatch adapter was written against a guessed wire shape that the test
fake accepted but the live t3-afk server 400s — so the previously-green suite did
NOT mean the loop was actually wired to T3. Reverse-engineered the real contract
from the v0.0.27 binary, verified it live against t3-afk (including multi-turn),
and rewrote the adapter to match:
- dispatch sends BARE commands keyed by `type` (not a `command` string), with
client-minted threadId/commandId/messageId + createdAt; the server replies
{sequence}, so dispatch returns the id it generated (never one parsed back).
- a thread lives in a project (workspaceRoot = the repo checkout the agent runs
in), so dispatch ensures the repo's project (snapshot -> project.create iff
absent) before thread.create + thread.turn.start.
- add send_turn() for follow-up turns on an existing thread — multi-turn context
retention is verified live (turn 2 recalled turn 1).
- watcher reads thread liveness from latestTurn.state (completed->idle,
running/in_progress/pending->running, errored->error), not a non-existent
top-level `status` field.
Guard against recurrence: the test fake now REJECTS any command lacking a `type`
discriminator (the original bug fails loudly), plus an opt-in live smoke test
(tests/test_afk_t3_live.py) so "green" can mean "wired to T3".
Also align dispatch_policy to lower-priority-value-first (P0 before P1), matching
tracker conventions and Issue.priority's own docstring — it had deliberately
diverged to higher-first. Loop still ships DISABLED (kill switch on, empty
allowlist). 416 tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:27:00 +00:00
|
|
|
# One per repo, each the repo's most urgent (lowest-value) eligible issue:
|
|
|
|
|
# infra -> #10 (p1 < p5); realestate-crawler -> #21 (p2 < p3).
|
|
|
|
|
assert _selected_set(decisions) == {10, 21}
|
afk: add the autonomous issue-implementer loop (SHIPS DISABLED)
Adds app/afk/ — the "away-from-keyboard" control plane that watches the
issue tracker for ready-for-agent issues, dispatches each to a fresh
full-access T3 thread (with the issue-implementer preamble prepended,
because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting
run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed,
escalating or fix-forwarding via a small pure state machine.
The loop is split into pure cores (no I/O, exhaustively unit-tested) and
thin injected adapters (the only edges that ever touch T3, the tracker,
CI, or Slack — faked in every test, so nothing here talks to a real
server, GitHub/Forgejo, or the cluster):
pure: types, dispatch_policy, run_state_machine, phase_checklist,
config, issue_implementer_prompt
adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher,
notifier
loops: poller — CronJob tick #1: list_ready -> select_dispatchable
-> dispatch + stamp the in-progress lock (label only
AFTER a successful dispatch, so a failed dispatch
never leaves a phantom lock). Per-repo lock derived
from the ready set, since the CronJob is stateless
between ticks.
watcher — CronJob tick #2: assemble RunState from snapshot +
CI -> next_action -> act (close on success; relabel
ready-for-human + ring the doorbell on the two
escalations; dispatch a corrective turn on
fix-forward; refresh the progress checklist).
SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an
empty allowlist, so a freshly-loaded config dispatches nothing and does
zero I/O. The package is not imported by the running service and has no
auto-enable path. Arming it is a deliberate, later, manual step requiring
BOTH gates (clear the kill switch AND enrol the exact repos) so one
fat-fingered env var can't arm every repo.
Test-first throughout: 412 tests pass (poller + watcher add integration
tests wiring the real pure cores to in-memory fakes). mypy clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 21:15:11 +00:00
|
|
|
repos = [d.issue.repo for d in decisions]
|
|
|
|
|
assert len(repos) == len(set(repos)) # no repo appears twice
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def test_ineligible_higher_priority_does_not_consume_repo_slot(make_issue, make_config):
|
afk: wire the T3 adapter to the REAL orchestration contract + fix priority
The T3 dispatch adapter was written against a guessed wire shape that the test
fake accepted but the live t3-afk server 400s — so the previously-green suite did
NOT mean the loop was actually wired to T3. Reverse-engineered the real contract
from the v0.0.27 binary, verified it live against t3-afk (including multi-turn),
and rewrote the adapter to match:
- dispatch sends BARE commands keyed by `type` (not a `command` string), with
client-minted threadId/commandId/messageId + createdAt; the server replies
{sequence}, so dispatch returns the id it generated (never one parsed back).
- a thread lives in a project (workspaceRoot = the repo checkout the agent runs
in), so dispatch ensures the repo's project (snapshot -> project.create iff
absent) before thread.create + thread.turn.start.
- add send_turn() for follow-up turns on an existing thread — multi-turn context
retention is verified live (turn 2 recalled turn 1).
- watcher reads thread liveness from latestTurn.state (completed->idle,
running/in_progress/pending->running, errored->error), not a non-existent
top-level `status` field.
Guard against recurrence: the test fake now REJECTS any command lacking a `type`
discriminator (the original bug fails loudly), plus an opt-in live smoke test
(tests/test_afk_t3_live.py) so "green" can mean "wired to T3".
Also align dispatch_policy to lower-priority-value-first (P0 before P1), matching
tracker conventions and Issue.priority's own docstring — it had deliberately
diverged to higher-first. Loop still ships DISABLED (kill switch on, empty
allowlist). 416 tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:27:00 +00:00
|
|
|
"""A more-urgent issue that is itself ineligible (e.g. blocked) must not
|
|
|
|
|
suppress a less-urgent *eligible* issue in the same repo — the slot goes to
|
|
|
|
|
the best ELIGIBLE candidate, not merely the most urgent one."""
|
|
|
|
|
blocked_urgent = make_issue(number=1, repo="infra", priority=1, blocked_by=[99])
|
|
|
|
|
ready_minor = make_issue(number=2, repo="infra", priority=9)
|
afk: add the autonomous issue-implementer loop (SHIPS DISABLED)
Adds app/afk/ — the "away-from-keyboard" control plane that watches the
issue tracker for ready-for-agent issues, dispatches each to a fresh
full-access T3 thread (with the issue-implementer preamble prepended,
because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting
run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed,
escalating or fix-forwarding via a small pure state machine.
The loop is split into pure cores (no I/O, exhaustively unit-tested) and
thin injected adapters (the only edges that ever touch T3, the tracker,
CI, or Slack — faked in every test, so nothing here talks to a real
server, GitHub/Forgejo, or the cluster):
pure: types, dispatch_policy, run_state_machine, phase_checklist,
config, issue_implementer_prompt
adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher,
notifier
loops: poller — CronJob tick #1: list_ready -> select_dispatchable
-> dispatch + stamp the in-progress lock (label only
AFTER a successful dispatch, so a failed dispatch
never leaves a phantom lock). Per-repo lock derived
from the ready set, since the CronJob is stateless
between ticks.
watcher — CronJob tick #2: assemble RunState from snapshot +
CI -> next_action -> act (close on success; relabel
ready-for-human + ring the doorbell on the two
escalations; dispatch a corrective turn on
fix-forward; refresh the progress checklist).
SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an
empty allowlist, so a freshly-loaded config dispatches nothing and does
zero I/O. The package is not imported by the running service and has no
auto-enable path. Arming it is a deliberate, later, manual step requiring
BOTH gates (clear the kill switch AND enrol the exact repos) so one
fat-fingered env var can't arm every repo.
Test-first throughout: 412 tests pass (poller + watcher add integration
tests wiring the real pure cores to in-memory fakes). mypy clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 21:15:11 +00:00
|
|
|
decisions = dispatch_policy.select_dispatchable(
|
afk: wire the T3 adapter to the REAL orchestration contract + fix priority
The T3 dispatch adapter was written against a guessed wire shape that the test
fake accepted but the live t3-afk server 400s — so the previously-green suite did
NOT mean the loop was actually wired to T3. Reverse-engineered the real contract
from the v0.0.27 binary, verified it live against t3-afk (including multi-turn),
and rewrote the adapter to match:
- dispatch sends BARE commands keyed by `type` (not a `command` string), with
client-minted threadId/commandId/messageId + createdAt; the server replies
{sequence}, so dispatch returns the id it generated (never one parsed back).
- a thread lives in a project (workspaceRoot = the repo checkout the agent runs
in), so dispatch ensures the repo's project (snapshot -> project.create iff
absent) before thread.create + thread.turn.start.
- add send_turn() for follow-up turns on an existing thread — multi-turn context
retention is verified live (turn 2 recalled turn 1).
- watcher reads thread liveness from latestTurn.state (completed->idle,
running/in_progress/pending->running, errored->error), not a non-existent
top-level `status` field.
Guard against recurrence: the test fake now REJECTS any command lacking a `type`
discriminator (the original bug fails loudly), plus an opt-in live smoke test
(tests/test_afk_t3_live.py) so "green" can mean "wired to T3".
Also align dispatch_policy to lower-priority-value-first (P0 before P1), matching
tracker conventions and Issue.priority's own docstring — it had deliberately
diverged to higher-first. Loop still ships DISABLED (kill switch on, empty
allowlist). 416 tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:27:00 +00:00
|
|
|
[blocked_urgent, ready_minor], make_config(allowlist=["infra"]), set()
|
afk: add the autonomous issue-implementer loop (SHIPS DISABLED)
Adds app/afk/ — the "away-from-keyboard" control plane that watches the
issue tracker for ready-for-agent issues, dispatches each to a fresh
full-access T3 thread (with the issue-implementer preamble prepended,
because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting
run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed,
escalating or fix-forwarding via a small pure state machine.
The loop is split into pure cores (no I/O, exhaustively unit-tested) and
thin injected adapters (the only edges that ever touch T3, the tracker,
CI, or Slack — faked in every test, so nothing here talks to a real
server, GitHub/Forgejo, or the cluster):
pure: types, dispatch_policy, run_state_machine, phase_checklist,
config, issue_implementer_prompt
adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher,
notifier
loops: poller — CronJob tick #1: list_ready -> select_dispatchable
-> dispatch + stamp the in-progress lock (label only
AFTER a successful dispatch, so a failed dispatch
never leaves a phantom lock). Per-repo lock derived
from the ready set, since the CronJob is stateless
between ticks.
watcher — CronJob tick #2: assemble RunState from snapshot +
CI -> next_action -> act (close on success; relabel
ready-for-human + ring the doorbell on the two
escalations; dispatch a corrective turn on
fix-forward; refresh the progress checklist).
SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an
empty allowlist, so a freshly-loaded config dispatches nothing and does
zero I/O. The package is not imported by the running service and has no
auto-enable path. Arming it is a deliberate, later, manual step requiring
BOTH gates (clear the kill switch AND enrol the exact repos) so one
fat-fingered env var can't arm every repo.
Test-first throughout: 412 tests pass (poller + watcher add integration
tests wiring the real pure cores to in-memory fakes). mypy clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 21:15:11 +00:00
|
|
|
)
|
|
|
|
|
assert _selected_numbers(decisions) == [2]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
|
|
|
|
# blocked_by gating — blocked_by holds OPEN blocker numbers.
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
|
|
|
|
def test_blocked_issue_is_skipped(make_issue, make_config):
|
|
|
|
|
issue = make_issue(repo="infra", blocked_by=[101])
|
|
|
|
|
assert dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set()) == []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def test_unblocked_issue_with_empty_blocked_by_is_eligible(make_issue, make_config):
|
|
|
|
|
issue = make_issue(repo="infra", blocked_by=[])
|
|
|
|
|
assert len(dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set())) == 1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.parametrize("blockers", [[1], [1, 2], [5, 6, 7]])
|
|
|
|
|
def test_any_open_blocker_blocks(make_issue, make_config, blockers):
|
|
|
|
|
issue = make_issue(repo="infra", blocked_by=blockers)
|
|
|
|
|
assert dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set()) == []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def test_blocked_filters_only_blocked(make_issue, make_config):
|
|
|
|
|
ready = make_issue(number=1, repo="infra", blocked_by=[])
|
|
|
|
|
blocked = make_issue(number=2, repo="realestate-crawler", blocked_by=[7])
|
|
|
|
|
decisions = dispatch_policy.select_dispatchable(
|
|
|
|
|
[ready, blocked], make_config(allowlist=["infra", "realestate-crawler"]), set()
|
|
|
|
|
)
|
|
|
|
|
assert _selected_set(decisions) == {1}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
afk: wire the T3 adapter to the REAL orchestration contract + fix priority
The T3 dispatch adapter was written against a guessed wire shape that the test
fake accepted but the live t3-afk server 400s — so the previously-green suite did
NOT mean the loop was actually wired to T3. Reverse-engineered the real contract
from the v0.0.27 binary, verified it live against t3-afk (including multi-turn),
and rewrote the adapter to match:
- dispatch sends BARE commands keyed by `type` (not a `command` string), with
client-minted threadId/commandId/messageId + createdAt; the server replies
{sequence}, so dispatch returns the id it generated (never one parsed back).
- a thread lives in a project (workspaceRoot = the repo checkout the agent runs
in), so dispatch ensures the repo's project (snapshot -> project.create iff
absent) before thread.create + thread.turn.start.
- add send_turn() for follow-up turns on an existing thread — multi-turn context
retention is verified live (turn 2 recalled turn 1).
- watcher reads thread liveness from latestTurn.state (completed->idle,
running/in_progress/pending->running, errored->error), not a non-existent
top-level `status` field.
Guard against recurrence: the test fake now REJECTS any command lacking a `type`
discriminator (the original bug fails loudly), plus an opt-in live smoke test
(tests/test_afk_t3_live.py) so "green" can mean "wired to T3".
Also align dispatch_policy to lower-priority-value-first (P0 before P1), matching
tracker conventions and Issue.priority's own docstring — it had deliberately
diverged to higher-first. Loop still ships DISABLED (kill switch on, empty
allowlist). 416 tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:27:00 +00:00
|
|
|
# Priority ordering — lower priority value first, deterministic tiebreaker.
|
afk: add the autonomous issue-implementer loop (SHIPS DISABLED)
Adds app/afk/ — the "away-from-keyboard" control plane that watches the
issue tracker for ready-for-agent issues, dispatches each to a fresh
full-access T3 thread (with the issue-implementer preamble prepended,
because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting
run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed,
escalating or fix-forwarding via a small pure state machine.
The loop is split into pure cores (no I/O, exhaustively unit-tested) and
thin injected adapters (the only edges that ever touch T3, the tracker,
CI, or Slack — faked in every test, so nothing here talks to a real
server, GitHub/Forgejo, or the cluster):
pure: types, dispatch_policy, run_state_machine, phase_checklist,
config, issue_implementer_prompt
adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher,
notifier
loops: poller — CronJob tick #1: list_ready -> select_dispatchable
-> dispatch + stamp the in-progress lock (label only
AFTER a successful dispatch, so a failed dispatch
never leaves a phantom lock). Per-repo lock derived
from the ready set, since the CronJob is stateless
between ticks.
watcher — CronJob tick #2: assemble RunState from snapshot +
CI -> next_action -> act (close on success; relabel
ready-for-human + ring the doorbell on the two
escalations; dispatch a corrective turn on
fix-forward; refresh the progress checklist).
SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an
empty allowlist, so a freshly-loaded config dispatches nothing and does
zero I/O. The package is not imported by the running service and has no
auto-enable path. Arming it is a deliberate, later, manual step requiring
BOTH gates (clear the kill switch AND enrol the exact repos) so one
fat-fingered env var can't arm every repo.
Test-first throughout: 412 tests pass (poller + watcher add integration
tests wiring the real pure cores to in-memory fakes). mypy clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 21:15:11 +00:00
|
|
|
# --------------------------------------------------------------------------- #
|
afk: wire the T3 adapter to the REAL orchestration contract + fix priority
The T3 dispatch adapter was written against a guessed wire shape that the test
fake accepted but the live t3-afk server 400s — so the previously-green suite did
NOT mean the loop was actually wired to T3. Reverse-engineered the real contract
from the v0.0.27 binary, verified it live against t3-afk (including multi-turn),
and rewrote the adapter to match:
- dispatch sends BARE commands keyed by `type` (not a `command` string), with
client-minted threadId/commandId/messageId + createdAt; the server replies
{sequence}, so dispatch returns the id it generated (never one parsed back).
- a thread lives in a project (workspaceRoot = the repo checkout the agent runs
in), so dispatch ensures the repo's project (snapshot -> project.create iff
absent) before thread.create + thread.turn.start.
- add send_turn() for follow-up turns on an existing thread — multi-turn context
retention is verified live (turn 2 recalled turn 1).
- watcher reads thread liveness from latestTurn.state (completed->idle,
running/in_progress/pending->running, errored->error), not a non-existent
top-level `status` field.
Guard against recurrence: the test fake now REJECTS any command lacking a `type`
discriminator (the original bug fails loudly), plus an opt-in live smoke test
(tests/test_afk_t3_live.py) so "green" can mean "wired to T3".
Also align dispatch_policy to lower-priority-value-first (P0 before P1), matching
tracker conventions and Issue.priority's own docstring — it had deliberately
diverged to higher-first. Loop still ships DISABLED (kill switch on, empty
allowlist). 416 tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:27:00 +00:00
|
|
|
def test_lower_priority_value_first(make_issue, make_config):
|
|
|
|
|
p1 = make_issue(number=1, repo="infra", priority=1)
|
|
|
|
|
p5 = make_issue(number=2, repo="realestate-crawler", priority=5)
|
|
|
|
|
p9 = make_issue(number=3, repo="SparkyFitness", priority=9)
|
afk: add the autonomous issue-implementer loop (SHIPS DISABLED)
Adds app/afk/ — the "away-from-keyboard" control plane that watches the
issue tracker for ready-for-agent issues, dispatches each to a fresh
full-access T3 thread (with the issue-implementer preamble prepended,
because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting
run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed,
escalating or fix-forwarding via a small pure state machine.
The loop is split into pure cores (no I/O, exhaustively unit-tested) and
thin injected adapters (the only edges that ever touch T3, the tracker,
CI, or Slack — faked in every test, so nothing here talks to a real
server, GitHub/Forgejo, or the cluster):
pure: types, dispatch_policy, run_state_machine, phase_checklist,
config, issue_implementer_prompt
adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher,
notifier
loops: poller — CronJob tick #1: list_ready -> select_dispatchable
-> dispatch + stamp the in-progress lock (label only
AFTER a successful dispatch, so a failed dispatch
never leaves a phantom lock). Per-repo lock derived
from the ready set, since the CronJob is stateless
between ticks.
watcher — CronJob tick #2: assemble RunState from snapshot +
CI -> next_action -> act (close on success; relabel
ready-for-human + ring the doorbell on the two
escalations; dispatch a corrective turn on
fix-forward; refresh the progress checklist).
SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an
empty allowlist, so a freshly-loaded config dispatches nothing and does
zero I/O. The package is not imported by the running service and has no
auto-enable path. Arming it is a deliberate, later, manual step requiring
BOTH gates (clear the kill switch AND enrol the exact repos) so one
fat-fingered env var can't arm every repo.
Test-first throughout: 412 tests pass (poller + watcher add integration
tests wiring the real pure cores to in-memory fakes). mypy clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 21:15:11 +00:00
|
|
|
decisions = dispatch_policy.select_dispatchable(
|
afk: wire the T3 adapter to the REAL orchestration contract + fix priority
The T3 dispatch adapter was written against a guessed wire shape that the test
fake accepted but the live t3-afk server 400s — so the previously-green suite did
NOT mean the loop was actually wired to T3. Reverse-engineered the real contract
from the v0.0.27 binary, verified it live against t3-afk (including multi-turn),
and rewrote the adapter to match:
- dispatch sends BARE commands keyed by `type` (not a `command` string), with
client-minted threadId/commandId/messageId + createdAt; the server replies
{sequence}, so dispatch returns the id it generated (never one parsed back).
- a thread lives in a project (workspaceRoot = the repo checkout the agent runs
in), so dispatch ensures the repo's project (snapshot -> project.create iff
absent) before thread.create + thread.turn.start.
- add send_turn() for follow-up turns on an existing thread — multi-turn context
retention is verified live (turn 2 recalled turn 1).
- watcher reads thread liveness from latestTurn.state (completed->idle,
running/in_progress/pending->running, errored->error), not a non-existent
top-level `status` field.
Guard against recurrence: the test fake now REJECTS any command lacking a `type`
discriminator (the original bug fails loudly), plus an opt-in live smoke test
(tests/test_afk_t3_live.py) so "green" can mean "wired to T3".
Also align dispatch_policy to lower-priority-value-first (P0 before P1), matching
tracker conventions and Issue.priority's own docstring — it had deliberately
diverged to higher-first. Loop still ships DISABLED (kill switch on, empty
allowlist). 416 tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:27:00 +00:00
|
|
|
[p1, p9, p5],
|
afk: add the autonomous issue-implementer loop (SHIPS DISABLED)
Adds app/afk/ — the "away-from-keyboard" control plane that watches the
issue tracker for ready-for-agent issues, dispatches each to a fresh
full-access T3 thread (with the issue-implementer preamble prepended,
because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting
run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed,
escalating or fix-forwarding via a small pure state machine.
The loop is split into pure cores (no I/O, exhaustively unit-tested) and
thin injected adapters (the only edges that ever touch T3, the tracker,
CI, or Slack — faked in every test, so nothing here talks to a real
server, GitHub/Forgejo, or the cluster):
pure: types, dispatch_policy, run_state_machine, phase_checklist,
config, issue_implementer_prompt
adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher,
notifier
loops: poller — CronJob tick #1: list_ready -> select_dispatchable
-> dispatch + stamp the in-progress lock (label only
AFTER a successful dispatch, so a failed dispatch
never leaves a phantom lock). Per-repo lock derived
from the ready set, since the CronJob is stateless
between ticks.
watcher — CronJob tick #2: assemble RunState from snapshot +
CI -> next_action -> act (close on success; relabel
ready-for-human + ring the doorbell on the two
escalations; dispatch a corrective turn on
fix-forward; refresh the progress checklist).
SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an
empty allowlist, so a freshly-loaded config dispatches nothing and does
zero I/O. The package is not imported by the running service and has no
auto-enable path. Arming it is a deliberate, later, manual step requiring
BOTH gates (clear the kill switch AND enrol the exact repos) so one
fat-fingered env var can't arm every repo.
Test-first throughout: 412 tests pass (poller + watcher add integration
tests wiring the real pure cores to in-memory fakes). mypy clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 21:15:11 +00:00
|
|
|
make_config(allowlist=["infra", "realestate-crawler", "SparkyFitness"]),
|
|
|
|
|
set(),
|
|
|
|
|
)
|
afk: wire the T3 adapter to the REAL orchestration contract + fix priority
The T3 dispatch adapter was written against a guessed wire shape that the test
fake accepted but the live t3-afk server 400s — so the previously-green suite did
NOT mean the loop was actually wired to T3. Reverse-engineered the real contract
from the v0.0.27 binary, verified it live against t3-afk (including multi-turn),
and rewrote the adapter to match:
- dispatch sends BARE commands keyed by `type` (not a `command` string), with
client-minted threadId/commandId/messageId + createdAt; the server replies
{sequence}, so dispatch returns the id it generated (never one parsed back).
- a thread lives in a project (workspaceRoot = the repo checkout the agent runs
in), so dispatch ensures the repo's project (snapshot -> project.create iff
absent) before thread.create + thread.turn.start.
- add send_turn() for follow-up turns on an existing thread — multi-turn context
retention is verified live (turn 2 recalled turn 1).
- watcher reads thread liveness from latestTurn.state (completed->idle,
running/in_progress/pending->running, errored->error), not a non-existent
top-level `status` field.
Guard against recurrence: the test fake now REJECTS any command lacking a `type`
discriminator (the original bug fails loudly), plus an opt-in live smoke test
(tests/test_afk_t3_live.py) so "green" can mean "wired to T3".
Also align dispatch_policy to lower-priority-value-first (P0 before P1), matching
tracker conventions and Issue.priority's own docstring — it had deliberately
diverged to higher-first. Loop still ships DISABLED (kill switch on, empty
allowlist). 416 tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:27:00 +00:00
|
|
|
assert _selected_numbers(decisions) == [1, 2, 3] # priorities 1, 5, 9
|
afk: add the autonomous issue-implementer loop (SHIPS DISABLED)
Adds app/afk/ — the "away-from-keyboard" control plane that watches the
issue tracker for ready-for-agent issues, dispatches each to a fresh
full-access T3 thread (with the issue-implementer preamble prepended,
because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting
run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed,
escalating or fix-forwarding via a small pure state machine.
The loop is split into pure cores (no I/O, exhaustively unit-tested) and
thin injected adapters (the only edges that ever touch T3, the tracker,
CI, or Slack — faked in every test, so nothing here talks to a real
server, GitHub/Forgejo, or the cluster):
pure: types, dispatch_policy, run_state_machine, phase_checklist,
config, issue_implementer_prompt
adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher,
notifier
loops: poller — CronJob tick #1: list_ready -> select_dispatchable
-> dispatch + stamp the in-progress lock (label only
AFTER a successful dispatch, so a failed dispatch
never leaves a phantom lock). Per-repo lock derived
from the ready set, since the CronJob is stateless
between ticks.
watcher — CronJob tick #2: assemble RunState from snapshot +
CI -> next_action -> act (close on success; relabel
ready-for-human + ring the doorbell on the two
escalations; dispatch a corrective turn on
fix-forward; refresh the progress checklist).
SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an
empty allowlist, so a freshly-loaded config dispatches nothing and does
zero I/O. The package is not imported by the running service and has no
auto-enable path. Arming it is a deliberate, later, manual step requiring
BOTH gates (clear the kill switch AND enrol the exact repos) so one
fat-fingered env var can't arm every repo.
Test-first throughout: 412 tests pass (poller + watcher add integration
tests wiring the real pure cores to in-memory fakes). mypy clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 21:15:11 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
def test_ordering_independent_of_input_order(make_issue, make_config):
|
|
|
|
|
"""Whatever order the caller supplies issues in, the dispatch order is the
|
|
|
|
|
same — sorted purely by the policy, not by arrival."""
|
|
|
|
|
base = [
|
|
|
|
|
("infra", 10, 2),
|
|
|
|
|
("realestate-crawler", 20, 8),
|
|
|
|
|
("SparkyFitness", 30, 5),
|
|
|
|
|
("health", 40, 1),
|
|
|
|
|
]
|
|
|
|
|
allow = ["infra", "realestate-crawler", "SparkyFitness", "health"]
|
|
|
|
|
config = make_config(allowlist=allow)
|
afk: wire the T3 adapter to the REAL orchestration contract + fix priority
The T3 dispatch adapter was written against a guessed wire shape that the test
fake accepted but the live t3-afk server 400s — so the previously-green suite did
NOT mean the loop was actually wired to T3. Reverse-engineered the real contract
from the v0.0.27 binary, verified it live against t3-afk (including multi-turn),
and rewrote the adapter to match:
- dispatch sends BARE commands keyed by `type` (not a `command` string), with
client-minted threadId/commandId/messageId + createdAt; the server replies
{sequence}, so dispatch returns the id it generated (never one parsed back).
- a thread lives in a project (workspaceRoot = the repo checkout the agent runs
in), so dispatch ensures the repo's project (snapshot -> project.create iff
absent) before thread.create + thread.turn.start.
- add send_turn() for follow-up turns on an existing thread — multi-turn context
retention is verified live (turn 2 recalled turn 1).
- watcher reads thread liveness from latestTurn.state (completed->idle,
running/in_progress/pending->running, errored->error), not a non-existent
top-level `status` field.
Guard against recurrence: the test fake now REJECTS any command lacking a `type`
discriminator (the original bug fails loudly), plus an opt-in live smoke test
(tests/test_afk_t3_live.py) so "green" can mean "wired to T3".
Also align dispatch_policy to lower-priority-value-first (P0 before P1), matching
tracker conventions and Issue.priority's own docstring — it had deliberately
diverged to higher-first. Loop still ships DISABLED (kill switch on, empty
allowlist). 416 tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:27:00 +00:00
|
|
|
expected = [40, 10, 30, 20] # priorities 1,2,5,8 (most urgent first)
|
afk: add the autonomous issue-implementer loop (SHIPS DISABLED)
Adds app/afk/ — the "away-from-keyboard" control plane that watches the
issue tracker for ready-for-agent issues, dispatches each to a fresh
full-access T3 thread (with the issue-implementer preamble prepended,
because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting
run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed,
escalating or fix-forwarding via a small pure state machine.
The loop is split into pure cores (no I/O, exhaustively unit-tested) and
thin injected adapters (the only edges that ever touch T3, the tracker,
CI, or Slack — faked in every test, so nothing here talks to a real
server, GitHub/Forgejo, or the cluster):
pure: types, dispatch_policy, run_state_machine, phase_checklist,
config, issue_implementer_prompt
adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher,
notifier
loops: poller — CronJob tick #1: list_ready -> select_dispatchable
-> dispatch + stamp the in-progress lock (label only
AFTER a successful dispatch, so a failed dispatch
never leaves a phantom lock). Per-repo lock derived
from the ready set, since the CronJob is stateless
between ticks.
watcher — CronJob tick #2: assemble RunState from snapshot +
CI -> next_action -> act (close on success; relabel
ready-for-human + ring the doorbell on the two
escalations; dispatch a corrective turn on
fix-forward; refresh the progress checklist).
SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an
empty allowlist, so a freshly-loaded config dispatches nothing and does
zero I/O. The package is not imported by the running service and has no
auto-enable path. Arming it is a deliberate, later, manual step requiring
BOTH gates (clear the kill switch AND enrol the exact repos) so one
fat-fingered env var can't arm every repo.
Test-first throughout: 412 tests pass (poller + watcher add integration
tests wiring the real pure cores to in-memory fakes). mypy clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 21:15:11 +00:00
|
|
|
|
|
|
|
|
for perm in itertools.permutations(base):
|
|
|
|
|
issues = [make_issue(number=n, repo=r, priority=p) for (r, n, p) in perm]
|
|
|
|
|
decisions = dispatch_policy.select_dispatchable(issues, config, set())
|
|
|
|
|
assert _selected_numbers(decisions) == expected
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def test_priority_ties_break_deterministically_by_issue_number(make_issue, make_config):
|
|
|
|
|
"""Equal priority across different repos -> a stable, total order. We tie-break
|
|
|
|
|
on ascending issue number so the result never depends on dict/set iteration
|
|
|
|
|
or input order."""
|
|
|
|
|
a = make_issue(number=30, repo="infra", priority=5)
|
|
|
|
|
b = make_issue(number=10, repo="realestate-crawler", priority=5)
|
|
|
|
|
c = make_issue(number=20, repo="SparkyFitness", priority=5)
|
|
|
|
|
config = make_config(allowlist=["infra", "realestate-crawler", "SparkyFitness"])
|
|
|
|
|
|
|
|
|
|
for perm in itertools.permutations([a, b, c]):
|
|
|
|
|
decisions = dispatch_policy.select_dispatchable(list(perm), config, set())
|
|
|
|
|
assert _selected_numbers(decisions) == [10, 20, 30]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def test_negative_and_zero_priorities_order_correctly(make_issue, make_config):
|
|
|
|
|
neg = make_issue(number=1, repo="infra", priority=-5)
|
|
|
|
|
zero = make_issue(number=2, repo="realestate-crawler", priority=0)
|
|
|
|
|
pos = make_issue(number=3, repo="SparkyFitness", priority=3)
|
|
|
|
|
decisions = dispatch_policy.select_dispatchable(
|
|
|
|
|
[neg, zero, pos],
|
|
|
|
|
make_config(allowlist=["infra", "realestate-crawler", "SparkyFitness"]),
|
|
|
|
|
set(),
|
|
|
|
|
)
|
afk: wire the T3 adapter to the REAL orchestration contract + fix priority
The T3 dispatch adapter was written against a guessed wire shape that the test
fake accepted but the live t3-afk server 400s — so the previously-green suite did
NOT mean the loop was actually wired to T3. Reverse-engineered the real contract
from the v0.0.27 binary, verified it live against t3-afk (including multi-turn),
and rewrote the adapter to match:
- dispatch sends BARE commands keyed by `type` (not a `command` string), with
client-minted threadId/commandId/messageId + createdAt; the server replies
{sequence}, so dispatch returns the id it generated (never one parsed back).
- a thread lives in a project (workspaceRoot = the repo checkout the agent runs
in), so dispatch ensures the repo's project (snapshot -> project.create iff
absent) before thread.create + thread.turn.start.
- add send_turn() for follow-up turns on an existing thread — multi-turn context
retention is verified live (turn 2 recalled turn 1).
- watcher reads thread liveness from latestTurn.state (completed->idle,
running/in_progress/pending->running, errored->error), not a non-existent
top-level `status` field.
Guard against recurrence: the test fake now REJECTS any command lacking a `type`
discriminator (the original bug fails loudly), plus an opt-in live smoke test
(tests/test_afk_t3_live.py) so "green" can mean "wired to T3".
Also align dispatch_policy to lower-priority-value-first (P0 before P1), matching
tracker conventions and Issue.priority's own docstring — it had deliberately
diverged to higher-first. Loop still ships DISABLED (kill switch on, empty
allowlist). 416 tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:27:00 +00:00
|
|
|
assert _selected_numbers(decisions) == [1, 2, 3] # -5 < 0 < 3 (most urgent first)
|
afk: add the autonomous issue-implementer loop (SHIPS DISABLED)
Adds app/afk/ — the "away-from-keyboard" control plane that watches the
issue tracker for ready-for-agent issues, dispatches each to a fresh
full-access T3 thread (with the issue-implementer preamble prepended,
because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting
run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed,
escalating or fix-forwarding via a small pure state machine.
The loop is split into pure cores (no I/O, exhaustively unit-tested) and
thin injected adapters (the only edges that ever touch T3, the tracker,
CI, or Slack — faked in every test, so nothing here talks to a real
server, GitHub/Forgejo, or the cluster):
pure: types, dispatch_policy, run_state_machine, phase_checklist,
config, issue_implementer_prompt
adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher,
notifier
loops: poller — CronJob tick #1: list_ready -> select_dispatchable
-> dispatch + stamp the in-progress lock (label only
AFTER a successful dispatch, so a failed dispatch
never leaves a phantom lock). Per-repo lock derived
from the ready set, since the CronJob is stateless
between ticks.
watcher — CronJob tick #2: assemble RunState from snapshot +
CI -> next_action -> act (close on success; relabel
ready-for-human + ring the doorbell on the two
escalations; dispatch a corrective turn on
fix-forward; refresh the progress checklist).
SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an
empty allowlist, so a freshly-loaded config dispatches nothing and does
zero I/O. The package is not imported by the running service and has no
auto-enable path. Arming it is a deliberate, later, manual step requiring
BOTH gates (clear the kill switch AND enrol the exact repos) so one
fat-fingered env var can't arm every repo.
Test-first throughout: 412 tests pass (poller + watcher add integration
tests wiring the real pure cores to in-memory fakes). mypy clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 21:15:11 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
|
|
|
|
# Reasons — human-readable, never parsed, but must be present and sensible.
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
|
|
|
|
def test_every_decision_has_a_nonempty_reason(make_issue, make_config):
|
|
|
|
|
issues = [
|
|
|
|
|
make_issue(number=1, repo="infra", priority=3),
|
|
|
|
|
make_issue(number=2, repo="realestate-crawler", priority=1),
|
|
|
|
|
]
|
|
|
|
|
decisions = dispatch_policy.select_dispatchable(
|
|
|
|
|
issues, make_config(allowlist=["infra", "realestate-crawler"]), set()
|
|
|
|
|
)
|
|
|
|
|
assert decisions # sanity
|
|
|
|
|
assert all(d.reason.strip() for d in decisions)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
|
|
|
|
# Combined matrix — every gate together. A single eligible needle in a haystack
|
|
|
|
|
# of issues that each trip exactly one gate.
|
|
|
|
|
# --------------------------------------------------------------------------- #
|
|
|
|
|
def test_only_the_fully_eligible_issue_survives_all_gates(make_issue, make_config):
|
|
|
|
|
config = make_config(allowlist=["infra", "realestate-crawler"], kill_switch=False)
|
|
|
|
|
in_flight = {"realestate-crawler"} # this repo is locked
|
|
|
|
|
|
|
|
|
|
issues = [
|
|
|
|
|
make_issue(number=1, repo="infra", priority=5), # ELIGIBLE
|
|
|
|
|
make_issue(number=2, repo="not-allowed", priority=9), # allowlist
|
|
|
|
|
make_issue(number=3, repo="infra", priority=9, labeled_by_trusted=False), # trust
|
|
|
|
|
make_issue(number=4, repo="infra", priority=9, blocked_by=[1]), # blocked
|
|
|
|
|
make_issue(number=5, repo="realestate-crawler", priority=9), # repo locked
|
|
|
|
|
]
|
|
|
|
|
decisions = dispatch_policy.select_dispatchable(issues, config, in_flight)
|
|
|
|
|
assert _selected_numbers(decisions) == [1]
|
|
|
|
|
assert decisions[0].issue.repo == "infra"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.parametrize("trusted", [True, False])
|
|
|
|
|
@pytest.mark.parametrize("allowed", [True, False])
|
|
|
|
|
@pytest.mark.parametrize("blocked", [True, False])
|
|
|
|
|
@pytest.mark.parametrize("locked", [True, False])
|
|
|
|
|
@pytest.mark.parametrize("killed", [True, False])
|
|
|
|
|
def test_full_eligibility_matrix(
|
|
|
|
|
make_issue, make_config, trusted, allowed, blocked, locked, killed
|
|
|
|
|
):
|
|
|
|
|
"""Exhaustive truth table: an issue is dispatched iff ALL gates pass and the
|
|
|
|
|
kill switch is off. 2**5 = 32 cases, single issue so ordering is moot."""
|
|
|
|
|
issue = make_issue(
|
|
|
|
|
number=1,
|
|
|
|
|
repo="infra",
|
|
|
|
|
priority=0,
|
|
|
|
|
labeled_by_trusted=trusted,
|
|
|
|
|
blocked_by=[99] if blocked else [],
|
|
|
|
|
)
|
|
|
|
|
config = make_config(
|
|
|
|
|
allowlist=["infra"] if allowed else ["other-repo"],
|
|
|
|
|
kill_switch=killed,
|
|
|
|
|
)
|
|
|
|
|
in_flight = {"infra"} if locked else set()
|
|
|
|
|
|
|
|
|
|
decisions = dispatch_policy.select_dispatchable([issue], config, in_flight)
|
|
|
|
|
|
|
|
|
|
should_dispatch = trusted and allowed and not blocked and not locked and not killed
|
|
|
|
|
assert (len(decisions) == 1) is should_dispatch
|
|
|
|
|
if should_dispatch:
|
|
|
|
|
assert decisions[0].issue is issue
|