claude-agent-service/docs/adr/0002-afk-autonomous-merge-and-failure-posture.md

# AFK agents push straight to master; failures fix-forward then freeze, not revert

The AFK implementation pipeline (see
`docs/2026-06-14-afk-implementation-pipeline-design.md`) lets an autonomous
agent land code with no human at the keyboard. The owner deliberately chose the
most hands-off posture: **AFK-written code pushes straight to `master`** (which
then deploys via the existing CI/CD chain) with **no pull-request review gate**,
and when a deploy breaks, the agent **fixes forward and then freezes the broken
state** rather than auto-reverting. This ADR records that risk posture and why it
was chosen over the safer alternatives, because it is surprising and not cheap to
walk back once callers and habits depend on it.

## Status

accepted (2026-06-14) — posture decided; enforced once the pipeline ships
(pilot-gated).

## Context

`master` on every enrolled repo deploys continuously (GHA build → ghcr →
Woodpecker → Keel). So "where AFK code lands" is really "what reaches a live
deploy without a human looking". The owner weighed three merge gates and three
post-push failure responses and picked the autonomy-maximizing end of both,
accepting the blast radius explicitly.

## Considered options — merge gate

- **Always push to master (chosen).** Tests-green is the gate; CI + rollback are
  the safety net. Matches the existing human allow-then-audit model (non-admins
  already push straight to master). Most hands-off.
- **Adaptive (push if confident, else PR)** — rejected as the *default* though it
  is what `issue-responder` does; the owner wanted full hands-off, not a
  confidence-gated PR for otherwise-working code.
- **Always open a PR** — rejected: reintroduces a human merge step on every
  issue, i.e. "AFK implementation, human merge" — not the goal.

## Considered options — post-push failure (CI/rollout goes red after a green push)

- **Fix-forward then freeze (chosen).** Iterate with corrective commits up to
  **5 attempts or 60 minutes**; if still red, **leave the broken state in place**
  (do not revert), relabel the issue `ready-for-human`, and hard-page. Same
  forensics-first instinct as the breakglass (ADR 0001): preserve the exact
  failing state for debugging rather than auto-cleaning it away.
- **Auto-revert + escalate** — rejected (was the recommendation): restores green
  fastest, but destroys the forensic state the owner wants to inspect.
- **Alert and freeze immediately (no fix-forward)** — rejected: gives up on
  transient/env-drift failures a corrective commit would clear.

Pre-push failure (can't reach green, blocked, or would need a disallowed op) is
not a dilemma: the agent does **not** push, relabels `ready-for-human`, comments
what it tried, and pages.

## Consequences

- An unreviewed logic error can deploy before any human sees it; rollback (not
  review) is the safety net. Bounded by: tests-as-gate, the start-small
  allowlist, the per-repo lock, and the kill switch.
- A frozen-broken deploy can sit unhealthy until the owner answers the page —
  availability is traded for debuggability, by explicit choice. Acceptable
  because enrolled repos are non-critical by the allowlist prerequisite, and the
  owner is paged hard (Slack + ntfy).
- Fix-forward can stack up to 5 commits on a bad change before freezing; the
  60-minute cap bounds the churn window.
- Per-issue spend is capped at `max_budget_usd = 100`.
- Guardrails still hold underneath this posture: no PVC/PV deletes, no direct
  Vault edits, no force-push, infra changes Terraform-only, never `[ci skip]`.
- Reversible: tightening to adaptive/PR or to auto-revert is a config + watcher
  change, not a re-architecture — but callers/habits will have formed around
  "it just lands", so flag loudly if reversing.
docs: capture AFK implementation pipeline design + ADRs 0002-0004 Record the architecture for moving code implementation AFK, decided in a design/grilling session. The owner wants the human-in-the-loop boundary to stop at design + spec: once an issue is triaged ready-for-agent, an agent should implement it test-first, push it, and see it to a healthy deploy on its own, escalating only when it can't proceed. Decisions captured: - claude-agent-service is the control plane (poller + watcher + safety); a dedicated in-cluster T3 Code instance is the executor + cockpit, because T3 can only show sessions it launched itself -> we dispatch into it (ADR 0003). - AFK code pushes straight to master; on a broken deploy it fix-forwards then freezes the broken state for forensics rather than reverting (ADR 0002). - Implementation agents use persistent per-repo checkouts + git worktrees on SSD-NFS for warm caches, reversing the throwaway-clone rule for this path because concurrency is serial-within-repo (ADR 0004). Pilot-gated: five integration unknowns must be validated against a dedicated T3 instance before the poller is wired. No code yet. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> 2026-06-14 19:09:12 +00:00			`# AFK agents push straight to master; failures fix-forward then freeze, not revert`

			`The AFK implementation pipeline (see`
			`docs/2026-06-14-afk-implementation-pipeline-design.md`) lets an autonomous
			`agent land code with no human at the keyboard. The owner deliberately chose the`
			most hands-off posture: AFK-written code pushes straight to `master` (which
			`then deploys via the existing CI/CD chain) with no pull-request review gate,`
			`and when a deploy breaks, the agent **fixes forward and then freezes the broken`
			`state** rather than auto-reverting. This ADR records that risk posture and why it`
			`was chosen over the safer alternatives, because it is surprising and not cheap to`
			`walk back once callers and habits depend on it.`

			`## Status`

			`accepted (2026-06-14) — posture decided; enforced once the pipeline ships`
			`(pilot-gated).`

			`## Context`

			`master` on every enrolled repo deploys continuously (GHA build → ghcr →
			`Woodpecker → Keel). So "where AFK code lands" is really "what reaches a live`
			`deploy without a human looking". The owner weighed three merge gates and three`
			`post-push failure responses and picked the autonomy-maximizing end of both,`
			`accepting the blast radius explicitly.`

			`## Considered options — merge gate`

			`- Always push to master (chosen). Tests-green is the gate; CI + rollback are`
			`the safety net. Matches the existing human allow-then-audit model (non-admins`
			`already push straight to master). Most hands-off.`
			`- Adaptive (push if confident, else PR) — rejected as the default though it`
			is what `issue-responder` does; the owner wanted full hands-off, not a
			`confidence-gated PR for otherwise-working code.`
			`- Always open a PR — rejected: reintroduces a human merge step on every`
			`issue, i.e. "AFK implementation, human merge" — not the goal.`

			`## Considered options — post-push failure (CI/rollout goes red after a green push)`

			`- Fix-forward then freeze (chosen). Iterate with corrective commits up to`
			`5 attempts or 60 minutes; if still red, leave the broken state in place`
			(do not revert), relabel the issue `ready-for-human`, and hard-page. Same
			`forensics-first instinct as the breakglass (ADR 0001): preserve the exact`
			`failing state for debugging rather than auto-cleaning it away.`
			`- Auto-revert + escalate — rejected (was the recommendation): restores green`
			`fastest, but destroys the forensic state the owner wants to inspect.`
			`- Alert and freeze immediately (no fix-forward) — rejected: gives up on`
			`transient/env-drift failures a corrective commit would clear.`

			`Pre-push failure (can't reach green, blocked, or would need a disallowed op) is`
			not a dilemma: the agent does not push, relabels `ready-for-human`, comments
			`what it tried, and pages.`

			`## Consequences`

			`- An unreviewed logic error can deploy before any human sees it; rollback (not`
			`review) is the safety net. Bounded by: tests-as-gate, the start-small`
			`allowlist, the per-repo lock, and the kill switch.`
			`- A frozen-broken deploy can sit unhealthy until the owner answers the page —`
			`availability is traded for debuggability, by explicit choice. Acceptable`
			`because enrolled repos are non-critical by the allowlist prerequisite, and the`
			`owner is paged hard (Slack + ntfy).`
			`- Fix-forward can stack up to 5 commits on a bad change before freezing; the`
			`60-minute cap bounds the churn window.`
			- Per-issue spend is capped at `max_budget_usd = 100`.
			`- Guardrails still hold underneath this posture: no PVC/PV deletes, no direct`
			Vault edits, no force-push, infra changes Terraform-only, never `[ci skip]`.
			`- Reversible: tightening to adaptive/PR or to auto-revert is a config + watcher`
			`change, not a re-architecture — but callers/habits will have formed around`
			`"it just lands", so flag loudly if reversing.`