t3-afk: dedicated in-cluster T3 Code instance (AFK executor + cockpit)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Slice #2 of claude-agent-service PRD #1 (AFK implementation pipeline). Dedicated in-cluster T3 Code instance the control plane dispatches issues into; runs the issue-implementer agent in a git worktree with a live cockpit. Applied + live 2026-06-14 (9 resources). Pilot-fast: stock docker.io/library/node:24 + install pinned t3@0.0.27 + Claude CLI at startup onto an SSD-NFS PVC. Authentik-gated ingress. issue-implementer behaviour ships as a user-level ~/.claude/CLAUDE.md (T3 hardcodes the system prompt; settingSources loads it) and forbids plan-mode/clarifying-questions so unattended threads don't stall. Keel-excluded (ADR 0003). wait_for_rollout=false (slow first start). Image fully-qualified for the Kyverno trusted-registries allowlist; container mem limit 4Gi (tier-aux LimitRange cap). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
bc7b28244f
commit
d8c60d7ab8
3 changed files with 425 additions and 0 deletions
59
stacks/t3-afk/files/issue-implementer-CLAUDE.md
Normal file
59
stacks/t3-afk/files/issue-implementer-CLAUDE.md
Normal file
|
|
@ -0,0 +1,59 @@
|
|||
# issue-implementer — autonomous AFK coding agent
|
||||
|
||||
You are **issue-implementer**, an autonomous agent that implements ONE GitHub
|
||||
issue end-to-end and lands it, with no human at the keyboard. This file is your
|
||||
standing behaviour; the specific task arrives as your prompt. You run inside a
|
||||
T3 Code thread in `full-access` mode (skip-permissions) — there is no one to
|
||||
answer questions mid-run.
|
||||
|
||||
## Autonomy — non-negotiable (you will hang otherwise)
|
||||
|
||||
- **Never enter plan mode and never call `ExitPlanMode`.** It is intercepted and
|
||||
will stall this thread forever.
|
||||
- **Never ask clarifying questions / never call `AskUserQuestion`.** No human is
|
||||
watching. Make the most reasonable assumption, state it in a commit/your final
|
||||
message, and proceed.
|
||||
- If you hit something you genuinely cannot resolve safely, **stop and write a
|
||||
precise blocker report as your final message** (what you tried, what's
|
||||
unresolved, what you'd need). Do not thrash. The orchestrator escalates it to a
|
||||
human — that is the only "ask for help" channel you have.
|
||||
|
||||
## What to do
|
||||
|
||||
1. **Understand the task.** Your prompt contains the issue (number, what to
|
||||
build, acceptance criteria). Read the issue's AGENT-BRIEF if present.
|
||||
2. **Work in the prepared worktree.** You are already in a git worktree on a
|
||||
branch off `master`. Read the repo's own `CLAUDE.md`, `CONTEXT.md`, and any
|
||||
`docs/adr/` in the area you touch — use its domain vocabulary and respect its
|
||||
decisions.
|
||||
3. **Test-first (TDD).** Write a failing test that captures the desired
|
||||
behaviour, make it pass, then refactor. Prefer property/parameterized tests.
|
||||
Run the repo's actual test suite and get it green before you commit. Do not
|
||||
test implementation details — test external behaviour.
|
||||
4. **Commit.** Subject = what changed; body = why, paraphrasing the issue in
|
||||
plain words. Include `Closes #<issue-number>` and the trailer
|
||||
`Implemented-by: issue-implementer (AFK)`. Stage files by name — never
|
||||
`git add -A`/`.`. Never skip hooks.
|
||||
5. **Land it.** Push your branch to `master` (`git push origin HEAD:master`). If
|
||||
the push is rejected non-fast-forward, fetch, merge `origin/master`, re-run
|
||||
the tests, and push again. Pushing to `master` is the intended behaviour —
|
||||
CI builds and deploys from there.
|
||||
6. **Report.** Your final message is a concise summary: what you built, the
|
||||
commit, and anything a reviewer should know. (CI/deploy watching and any
|
||||
fix-forward/freeze handling are done by the control plane, not by you — once
|
||||
you've pushed green code, your job is done.)
|
||||
|
||||
## Guardrails (hard limits)
|
||||
|
||||
- **Never force-push** to `master`.
|
||||
- **Never delete PVCs/PVs**, drop database tables, or run destructive data ops.
|
||||
- **Never edit Vault directly**, and never commit secrets.
|
||||
- **Infrastructure changes go through Terraform/Terragrunt only** — never
|
||||
`kubectl apply/edit/patch` as the final state.
|
||||
- **Never use `[ci skip]`** — it hides the change from the audit feed.
|
||||
- Stay within the issue's scope. Don't refactor adjacent code beyond what the
|
||||
task needs.
|
||||
|
||||
## Done means
|
||||
|
||||
Tests green **and** pushed to `master`. Not "code written" — landed.
|
||||
348
stacks/t3-afk/main.tf
Normal file
348
stacks/t3-afk/main.tf
Normal file
|
|
@ -0,0 +1,348 @@
|
|||
# =============================================================================
|
||||
# t3-afk — dedicated, in-cluster T3 Code instance: the EXECUTOR + COCKPIT for the
|
||||
# AFK implementation pipeline (slice #2 of claude-agent-service PRD #1).
|
||||
#
|
||||
# claude-agent-service (control plane) dispatches issues INTO this T3 instance
|
||||
# over its orchestration HTTP API; T3 runs the issue-implementer agent in a git
|
||||
# worktree and shows every worker in its cockpit. See:
|
||||
# claude-agent-service/docs/2026-06-14-afk-implementation-pipeline-design.md
|
||||
# claude-agent-service/docs/adr/0003-t3-thin-executor-and-cockpit.md
|
||||
#
|
||||
# PILOT SHORTCUT (chosen 2026-06-14): no custom-built image. We run stock
|
||||
# `node:24` (the full image ships git + python3/make/g++ for node-pty) and an
|
||||
# init container installs PINNED npm packages (t3@0.0.27 + the Claude CLI) onto
|
||||
# the SSD PVC, cached across restarts. Formalize a digest-pinned built image
|
||||
# post-GO. T3 is version-pinned (npm) and NOT Keel-enrolled.
|
||||
# =============================================================================
|
||||
|
||||
# No plan-time Vault reads — every secret flows through the ExternalSecret below
|
||||
# (CLAUDE_CODE_OAUTH_TOKEN / GITHUB_TOKEN / FORGEJO_TOKEN), injected as env at
|
||||
# runtime. Nothing here needs a secret value at plan time.
|
||||
|
||||
# Wildcard TLS secret name — value comes from config.tfvars; consumed by the
|
||||
# ingress factory (every stack that uses the factory declares this).
|
||||
variable "tls_secret_name" {}
|
||||
|
||||
locals {
|
||||
namespace = "t3-afk"
|
||||
# Stock node base — the FULL node:24 (not -slim) is buildpack-deps-based, so it
|
||||
# ships git + build-essential (python3/make/g++) that node-pty + the agent need.
|
||||
# Fully-qualified (docker.io/library/...) to satisfy the Kyverno
|
||||
# require-trusted-registries allowlist via `docker.io/*` — bare `node*` is NOT
|
||||
# on the bare-DockerHub-library list (alpine*/busybox*/python* are).
|
||||
image = "docker.io/library/node:24"
|
||||
# Pinned npm versions installed at startup (the reproducibility anchor for the
|
||||
# pilot until a digest-pinned image exists).
|
||||
t3_version = "0.0.27"
|
||||
claude_cli_version = "latest" # @anthropic-ai/claude-code
|
||||
labels = {
|
||||
app = "t3-afk"
|
||||
}
|
||||
}
|
||||
|
||||
# --- Namespace ---
|
||||
|
||||
resource "kubernetes_namespace" "t3_afk" {
|
||||
metadata {
|
||||
name = local.namespace
|
||||
labels = {
|
||||
tier = local.tiers.aux
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# --- Secrets ---
|
||||
# The Claude provider authenticates with CLAUDE_CODE_OAUTH_TOKEN (T3 passes the
|
||||
# environment straight through to the embedded claude-agent-sdk + claude CLI).
|
||||
# GITHUB_TOKEN / FORGEJO_TOKEN authenticate the agent's `git push` from worktrees
|
||||
# (wired into ~/.gitconfig insteadOf rewrites in the container command).
|
||||
|
||||
resource "kubernetes_manifest" "external_secret" {
|
||||
manifest = {
|
||||
apiVersion = "external-secrets.io/v1beta1"
|
||||
kind = "ExternalSecret"
|
||||
metadata = {
|
||||
name = "t3-afk-secrets"
|
||||
namespace = local.namespace
|
||||
}
|
||||
spec = {
|
||||
refreshInterval = "15m"
|
||||
secretStoreRef = {
|
||||
name = "vault-kv"
|
||||
kind = "ClusterSecretStore"
|
||||
}
|
||||
target = { name = "t3-afk-secrets" }
|
||||
data = [
|
||||
{
|
||||
secretKey = "CLAUDE_CODE_OAUTH_TOKEN"
|
||||
remoteRef = { key = "claude-agent-service", property = "claude_oauth_token" }
|
||||
},
|
||||
{
|
||||
secretKey = "GITHUB_TOKEN"
|
||||
remoteRef = { key = "viktor", property = "github_pat" }
|
||||
},
|
||||
{
|
||||
# Shared viktor-scoped admin PAT (also used by Woodpecker + the
|
||||
# claude-agent pod). Lets the agent git push / open PRs on Forgejo.
|
||||
secretKey = "FORGEJO_TOKEN"
|
||||
remoteRef = { key = "ci/global", property = "forgejo_push_token" }
|
||||
},
|
||||
]
|
||||
}
|
||||
}
|
||||
depends_on = [kubernetes_namespace.t3_afk]
|
||||
}
|
||||
|
||||
# issue-implementer behaviour. T3 hardcodes the claude_code system-prompt preset
|
||||
# (no API override), but loads settingSources [user,project,local] — so the
|
||||
# agent's standing instructions ride in the USER-level ~/.claude/CLAUDE.md, while
|
||||
# each target repo's own CLAUDE.md provides project context. ADR 0003.
|
||||
resource "kubernetes_config_map" "agent_claudemd" {
|
||||
metadata {
|
||||
name = "issue-implementer-claudemd"
|
||||
namespace = kubernetes_namespace.t3_afk.metadata[0].name
|
||||
}
|
||||
data = {
|
||||
"CLAUDE.md" = file("${path.module}/files/issue-implementer-CLAUDE.md")
|
||||
}
|
||||
}
|
||||
|
||||
# --- Storage ---
|
||||
# SSD-NFS (small-file friendly) for the T3 base dir: state.sqlite + the
|
||||
# server-signing-key (losing it invalidates every issued bearer), per-thread git
|
||||
# worktrees, the npm global install, and caches. ADR 0004.
|
||||
module "data" {
|
||||
source = "../../modules/kubernetes/nfs_volume"
|
||||
name = "t3-afk-data"
|
||||
namespace = kubernetes_namespace.t3_afk.metadata[0].name
|
||||
nfs_server = "192.168.1.127"
|
||||
nfs_path = "/srv/nfs-ssd/t3-afk-data"
|
||||
storage = "30Gi"
|
||||
}
|
||||
|
||||
# --- Deployment ---
|
||||
|
||||
resource "kubernetes_deployment" "t3_afk" {
|
||||
# Slow first start (image pull + npm install init + ESO secret sync) can
|
||||
# exceed the default rollout-wait timeout; verify pod readiness out-of-band.
|
||||
wait_for_rollout = false
|
||||
|
||||
metadata {
|
||||
name = "t3-afk"
|
||||
namespace = kubernetes_namespace.t3_afk.metadata[0].name
|
||||
labels = local.labels
|
||||
}
|
||||
|
||||
spec {
|
||||
replicas = 1
|
||||
# Single-writer state.sqlite — never run two pods against the same base dir.
|
||||
strategy {
|
||||
type = "Recreate"
|
||||
}
|
||||
|
||||
selector {
|
||||
match_labels = local.labels
|
||||
}
|
||||
|
||||
template {
|
||||
metadata {
|
||||
labels = merge(local.labels, {
|
||||
# Belt-and-braces: this namespace isn't Keel-enrolled, but pin the
|
||||
# churny pre-1.0 T3 explicitly out of any auto-upgrade. ADR 0003.
|
||||
"keel.sh/policy" = "never"
|
||||
})
|
||||
}
|
||||
|
||||
spec {
|
||||
security_context {
|
||||
run_as_user = 1000 # node
|
||||
run_as_group = 1000
|
||||
fs_group = 1000
|
||||
}
|
||||
|
||||
# NFS mounts land root-owned; make /data writable by uid 1000.
|
||||
init_container {
|
||||
name = "fix-perms"
|
||||
image = "busybox:1.37"
|
||||
command = ["sh", "-c", "mkdir -p /data && chown -R 1000:1000 /data && chmod 0775 /data"]
|
||||
security_context {
|
||||
run_as_user = 0
|
||||
}
|
||||
volume_mount {
|
||||
name = "data"
|
||||
mount_path = "/data"
|
||||
}
|
||||
resources {
|
||||
requests = { memory = "32Mi" }
|
||||
limits = { memory = "64Mi" }
|
||||
}
|
||||
}
|
||||
|
||||
# Install pinned t3 + Claude CLI onto the PVC (cached; skipped if already
|
||||
# present). Runs as uid 1000 so the install is owned by the runtime user.
|
||||
init_container {
|
||||
name = "install-t3"
|
||||
image = local.image
|
||||
command = ["bash", "-c", <<-EOF
|
||||
set -e
|
||||
export npm_config_cache=/data/npm-cache
|
||||
export npm_config_prefix=/data/npm-global
|
||||
mkdir -p /data/npm-global /data/npm-cache
|
||||
if [ ! -x /data/npm-global/bin/t3 ]; then
|
||||
echo "installing t3@${local.t3_version} + claude CLI ..."
|
||||
npm install -g "t3@${local.t3_version}" "@anthropic-ai/claude-code@${local.claude_cli_version}"
|
||||
else
|
||||
echo "t3 already installed: $(/data/npm-global/bin/t3 --version 2>/dev/null || echo unknown)"
|
||||
fi
|
||||
EOF
|
||||
]
|
||||
volume_mount {
|
||||
name = "data"
|
||||
mount_path = "/data"
|
||||
}
|
||||
resources {
|
||||
requests = { cpu = "200m", memory = "512Mi" }
|
||||
limits = { memory = "1Gi" }
|
||||
}
|
||||
}
|
||||
|
||||
container {
|
||||
name = "t3"
|
||||
image = local.image
|
||||
|
||||
# Configure git auth for the agent's pushes, then run T3 headless.
|
||||
# $$ escapes Terraform interpolation so the shell expands the env vars.
|
||||
command = ["bash", "-c", <<-EOF
|
||||
set -e
|
||||
export PATH=/data/npm-global/bin:$$PATH
|
||||
export npm_config_cache=/data/npm-cache
|
||||
|
||||
# git identity + token rewrites so the agent can push from worktrees.
|
||||
git config --global user.name "issue-implementer (AFK)"
|
||||
git config --global user.email "afk-agent@viktorbarzin.me"
|
||||
git config --global url."https://$${GITHUB_TOKEN}@github.com/".insteadOf "https://github.com/"
|
||||
git config --global url."https://$${GITHUB_TOKEN}@github.com/".insteadOf "git@github.com:"
|
||||
if [ -n "$${FORGEJO_TOKEN}" ]; then
|
||||
git config --global url."https://$${FORGEJO_TOKEN}@forgejo.viktorbarzin.me/".insteadOf "https://forgejo.viktorbarzin.me/"
|
||||
fi
|
||||
|
||||
exec t3 serve --mode web --host 0.0.0.0 --port 3773 --base-dir /data/t3
|
||||
EOF
|
||||
]
|
||||
|
||||
port {
|
||||
container_port = 3773
|
||||
}
|
||||
|
||||
env_from {
|
||||
secret_ref {
|
||||
name = "t3-afk-secrets"
|
||||
}
|
||||
}
|
||||
|
||||
env {
|
||||
name = "HOME"
|
||||
value = "/home/node"
|
||||
}
|
||||
env {
|
||||
name = "T3CODE_HOME"
|
||||
value = "/data/t3"
|
||||
}
|
||||
|
||||
# T3's API needs auth even for liveness; use a TCP probe on the port.
|
||||
liveness_probe {
|
||||
tcp_socket {
|
||||
port = 3773
|
||||
}
|
||||
initial_delay_seconds = 30
|
||||
period_seconds = 30
|
||||
}
|
||||
readiness_probe {
|
||||
tcp_socket {
|
||||
port = 3773
|
||||
}
|
||||
initial_delay_seconds = 15
|
||||
period_seconds = 10
|
||||
}
|
||||
|
||||
volume_mount {
|
||||
name = "data"
|
||||
mount_path = "/data"
|
||||
}
|
||||
# User-level agent instructions (settingSources: user).
|
||||
volume_mount {
|
||||
name = "agent-claudemd"
|
||||
mount_path = "/home/node/.claude/CLAUDE.md"
|
||||
sub_path = "CLAUDE.md"
|
||||
}
|
||||
|
||||
# Burstable (tier-aux). A live agent thread (node + claude) is memory
|
||||
# heavy; size for a small number of concurrent threads on this pilot
|
||||
# instance. No CPU limit per cluster policy.
|
||||
resources {
|
||||
requests = {
|
||||
cpu = "1"
|
||||
memory = "2Gi"
|
||||
}
|
||||
# Capped at the tier-aux LimitRange max (4Gi/container). If real
|
||||
# workloads OOM, opt the namespace out via the
|
||||
# resource-governance/custom-limitrange label (as claude-agent-service
|
||||
# does) and raise this.
|
||||
limits = {
|
||||
memory = "4Gi"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
volume {
|
||||
name = "data"
|
||||
persistent_volume_claim {
|
||||
claim_name = module.data.claim_name
|
||||
}
|
||||
}
|
||||
|
||||
volume {
|
||||
name = "agent-claudemd"
|
||||
config_map {
|
||||
name = kubernetes_config_map.agent_claudemd.metadata[0].name
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
lifecycle {
|
||||
ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
|
||||
}
|
||||
}
|
||||
|
||||
# --- Service ---
|
||||
|
||||
resource "kubernetes_service" "t3_afk" {
|
||||
metadata {
|
||||
name = "t3-afk"
|
||||
namespace = kubernetes_namespace.t3_afk.metadata[0].name
|
||||
labels = local.labels
|
||||
}
|
||||
spec {
|
||||
selector = local.labels
|
||||
port {
|
||||
port = 3773
|
||||
target_port = 3773
|
||||
}
|
||||
type = "ClusterIP"
|
||||
}
|
||||
}
|
||||
|
||||
# --- Ingress ---
|
||||
# The cockpit has no built-in user auth, so Authentik forward-auth is the gate.
|
||||
module "ingress" {
|
||||
source = "../../modules/kubernetes/ingress_factory"
|
||||
auth = "required"
|
||||
dns_type = "proxied"
|
||||
namespace = kubernetes_namespace.t3_afk.metadata[0].name
|
||||
name = "t3-afk"
|
||||
service_name = kubernetes_service.t3_afk.metadata[0].name
|
||||
port = 3773
|
||||
tls_secret_name = var.tls_secret_name
|
||||
}
|
||||
18
stacks/t3-afk/terragrunt.hcl
Normal file
18
stacks/t3-afk/terragrunt.hcl
Normal file
|
|
@ -0,0 +1,18 @@
|
|||
include "root" {
|
||||
path = find_in_parent_folders()
|
||||
}
|
||||
|
||||
dependency "platform" {
|
||||
config_path = "../platform"
|
||||
skip_outputs = true
|
||||
}
|
||||
|
||||
dependency "vault" {
|
||||
config_path = "../vault"
|
||||
skip_outputs = true
|
||||
}
|
||||
|
||||
dependency "external-secrets" {
|
||||
config_path = "../external-secrets"
|
||||
skip_outputs = true
|
||||
}
|
||||
Loading…
Add table
Add a link
Reference in a new issue