[beads-server] Auto-dispatch agent beads via CronJobs
## Context
Until now, handing work to the in-cluster `beads-task-runner` agent required
opening BeadBoard and clicking the manual Dispatch button on each bead. We
want users to be able to describe work as a bead, set `assignee=agent`, and
have the agent pick it up within a couple of minutes — no clicks.
The existing pieces already provide everything we need:
- `claude-agent-service` exposes `/execute` with a single-slot `asyncio.Lock`
- BeadBoard's `/api/agent-dispatch` builds the prompt and forwards the bearer
- BeadBoard's `/api/agent-status` reports `busy` via a cached `/health` poll
- Dolt stores beads and is already in-cluster at `dolt.beads-server:3306`
So the only missing component is a poller that ties them together. This
commit adds that poller as two Kubernetes CronJobs — matching the existing
infra pattern (OpenClaw task-processor, certbot-renewal, backups) rather than
introducing n8n or in-service polling.
## Flow
```
user: bd assign <id> agent
│
▼
Dolt @ dolt.beads-server.svc:3306 ◄──── every 2 min ────┐
│ │
▼ │
CronJob: beads-dispatcher │
1. GET beadboard/api/agent-status (busy? skip) │
2. bd query 'assignee=agent AND status=open' │
3. bd update -s in_progress (claim) │
4. POST beadboard/api/agent-dispatch │
5. bd note "dispatched: job=…" │
│ │
▼ │
claude-agent-service /execute │
beads-task-runner agent runs; notes/closes bead │
│ │
▼ │
done ──► next tick picks up the next bead ───────────────┘
CronJob: beads-reaper (every 10 min)
for bead (assignee=agent, status=in_progress, updated_at > 30 min):
bd note "reaper: no progress for Nm — blocking"
bd update -s blocked
```
## Decisions
- **Sentinel assignee `agent`** — free-form, no Beads schema change. Any bd
client can set it (`bd assign <id> agent`).
- **Sequential dispatch** — matches the service's `asyncio.Lock`. With a
2-min poll cadence and ~5-min average run, throughput is ~12 beads/hour.
Parallelism is a separate plan.
- **Fixed agent `beads-task-runner`** — read-only rails, matches the manual
Dispatch button. Broader-privilege agents stay manual via BeadBoard UI.
- **Image reuse** — the claude-agent-service image already ships `bd`, `jq`,
`curl`; a new CronJob-specific image would duplicate 400MB of infra tooling.
Mirror `claude_agent_service_image_tag` locally; bump on rebuild.
- **ConfigMap-mounted `metadata.json`** — declarative TF rather than reusing
the image-seeded file. The script copies it into `/tmp/.beads/` because bd
may touch the parent dir and ConfigMap mounts are read-only.
- **Kill switch (`beads_dispatcher_enabled`)** — single bool, default true.
When false, `suspend: true` on both CronJobs; manual Dispatch keeps working.
- **Reaper threshold 30 min** — `bd note` bumps `updated_at`, so a well-behaved
`beads-task-runner` never trips the reaper. Failures trip it; pod crashes
(in-memory job state lost) also trip it.
## What is NOT in this change
- No Terraform apply — requires Vault OIDC + cluster access. Apply manually:
`cd infra/stacks/beads-server && scripts/tg apply`
- No change to `claude-agent-service/` (already ships bd/jq/curl)
- No change to `beadboard/` (`/api/agent-dispatch` + `/api/agent-status` reused)
- No change to the `beads-task-runner` agent definition (rails unchanged)
- Parallelism: single-slot is MVP; multi-slot dispatch is a separate plan.
## Deviations from plan
Minor, documented in code comments:
- Reaper uses `.updated_at` instead of the plan's `.notes[].created_at`. bd
serializes `notes` as a string (not an array), and every `bd note` bumps
`updated_at` — equivalent for the reaper's purpose.
- ISO-8601 parsed via `python3`, not `date -d` — Alpine's busybox lacks GNU
`-d` and the image has python3.
- `HOME=/tmp` set as a safety net — bd may try to write state/lock files.
## Test plan
### Automated
```
$ cd infra/stacks/beads-server && terraform init -backend=false
Terraform has been successfully initialized!
$ terraform validate
Warning: Deprecated Resource (kubernetes_namespace → v1) # pre-existing, unrelated
Success! The configuration is valid, but there were some validation warnings as shown above.
$ terraform fmt stacks/beads-server/main.tf
# (no output — already formatted)
```
### Manual verification
1. **Apply**
```
vault login -method=oidc
cd infra/stacks/beads-server
scripts/tg apply
```
Expect: `kubernetes_config_map.beads_metadata`,
`kubernetes_cron_job_v1.beads_dispatcher`, `kubernetes_cron_job_v1.beads_reaper`
created. No changes to existing resources.
2. **CronJobs exist with right schedule**
```
kubectl -n beads-server get cronjob
```
Expect `beads-dispatcher */2 * * * *` and `beads-reaper */10 * * * *`,
both with `SUSPEND=False`.
3. **End-to-end smoke**
```
bd create "auto-dispatch smoke test" \
-d "Read /etc/hostname inside the agent sandbox and close." \
--acceptance "bd note includes 'hostname=' line and bead is closed."
bd assign <new-id> agent
# within 2 min:
bd show <new-id> --json | jq '{status, notes}'
```
Expect notes to contain `auto-dispatcher claimed at …` and
`dispatched: job=<uuid>`, status `in_progress`.
4. **Reaper smoke**
Assign + dispatch a long bead, then
`kubectl -n claude-agent delete pod -l app=claude-agent-service`. Within
30 min + one reaper tick, `bd show <id>` shows `blocked` with a
`reaper: no progress for Nm — blocking` note.
5. **Kill switch**
```
cd infra/stacks/beads-server
scripts/tg apply -var=beads_dispatcher_enabled=false
kubectl -n beads-server get cronjob
```
Expect `SUSPEND=True` on both CronJobs. Assign a bead to `agent`; verify
nothing happens within 5 min. Re-apply with `=true` to re-enable.
Runbook with all above plus reaper semantics + design choices at
`infra/docs/runbooks/beads-auto-dispatch.md`.
Closes: code-8sm
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
01955916b2
commit
1a7f68fe5b
2 changed files with 472 additions and 0 deletions
185
docs/runbooks/beads-auto-dispatch.md
Normal file
185
docs/runbooks/beads-auto-dispatch.md
Normal file
|
|
@ -0,0 +1,185 @@
|
|||
# Beads Auto-Dispatch Runbook
|
||||
|
||||
Users can hand work to the headless `beads-task-runner` agent by assigning a
|
||||
bead to the sentinel user `agent`. Two CronJobs in the `beads-server`
|
||||
namespace drive the pipeline:
|
||||
|
||||
- **`beads-dispatcher`** — every 2 min: picks up the highest-priority
|
||||
`assignee=agent`/`status=open` bead with non-empty acceptance criteria,
|
||||
claims it by flipping to `in_progress`, and POSTs it to BeadBoard's
|
||||
`/api/agent-dispatch`. BeadBoard forwards to `claude-agent-service` with
|
||||
the existing bearer-token flow.
|
||||
- **`beads-reaper`** — every 10 min: flips any `assignee=agent` +
|
||||
`status=in_progress` bead whose `updated_at` is older than 30 min to
|
||||
`status=blocked` with an explanatory note. Catches pod crashes mid-run.
|
||||
|
||||
The manual BeadBoard Dispatch button continues to work in parallel.
|
||||
|
||||
## Flow diagram
|
||||
|
||||
```
|
||||
user: bd assign <id> agent
|
||||
│
|
||||
▼
|
||||
Dolt @ dolt.beads-server.svc:3306 ◄──── every 2 min ────┐
|
||||
│ │
|
||||
▼ │
|
||||
CronJob: beads-dispatcher │
|
||||
1. GET beadboard/api/agent-status (busy?) │
|
||||
2. bd query 'assignee=agent AND status=open' │
|
||||
3. bd update -s in_progress (claim) │
|
||||
4. POST beadboard/api/agent-dispatch │
|
||||
5. bd note "dispatched: job=…" │
|
||||
│ │
|
||||
▼ │
|
||||
claude-agent-service /execute │
|
||||
beads-task-runner agent runs; notes/closes bead │
|
||||
│ │
|
||||
▼ │
|
||||
done ──► next tick picks up the next bead ───────────────┘
|
||||
|
||||
|
||||
CronJob: beads-reaper (every 10 min)
|
||||
for bead (assignee=agent, status=in_progress, updated_at > 30 min):
|
||||
bd note "reaper: no progress for Nm — blocking"
|
||||
bd update -s blocked
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Hand a bead to the agent
|
||||
|
||||
```
|
||||
bd create "Title" \
|
||||
-d "Full context — files, services, error messages. Any agent with no prior context must be able to execute this." \
|
||||
--acceptance "Concrete, verifiable criteria" \
|
||||
-p 2
|
||||
bd assign <new-id> agent
|
||||
```
|
||||
|
||||
**Acceptance criteria is required.** Beads without it are skipped by the
|
||||
dispatcher and stay in `open` forever. This is intentional — the
|
||||
`beads-task-runner` agent expects clear done conditions.
|
||||
|
||||
### Take a bead back (unassign)
|
||||
|
||||
```
|
||||
bd assign <id> ""
|
||||
```
|
||||
|
||||
If the bead is already `in_progress`, also reset it:
|
||||
|
||||
```
|
||||
bd update <id> -s open
|
||||
```
|
||||
|
||||
### Pause auto-dispatch
|
||||
|
||||
```
|
||||
cd infra/stacks/beads-server
|
||||
scripts/tg apply -var=beads_dispatcher_enabled=false
|
||||
```
|
||||
|
||||
This sets `spec.suspend: true` on both CronJobs. Existing running jobs
|
||||
continue; no new ticks fire. Re-enable by re-applying with
|
||||
`beads_dispatcher_enabled=true` (the default). Manual BeadBoard Dispatch
|
||||
remains available while paused.
|
||||
|
||||
### Read the logs
|
||||
|
||||
```
|
||||
# Recent dispatcher runs
|
||||
kubectl -n beads-server get jobs --selector=job-name --sort-by=.metadata.creationTimestamp | grep beads-dispatcher | tail
|
||||
kubectl -n beads-server logs job/<dispatcher-job-name>
|
||||
|
||||
# Tail the underlying agent once a bead dispatches
|
||||
kubectl -n claude-agent logs -l app=claude-agent-service -f
|
||||
|
||||
# Inspect reaper decisions
|
||||
kubectl -n beads-server get jobs | grep beads-reaper | tail
|
||||
kubectl -n beads-server logs job/<reaper-job-name>
|
||||
```
|
||||
|
||||
### Inspect a specific bead's dispatch history
|
||||
|
||||
```
|
||||
bd show <id> --json | jq '{status, assignee, notes, updated_at}'
|
||||
```
|
||||
|
||||
Both the dispatcher and reaper write dated notes (`auto-dispatcher claimed
|
||||
at…`, `dispatched: job=…`, `reaper: no progress for…`) so the audit trail
|
||||
lives on the bead itself.
|
||||
|
||||
## Reaper semantics — when a bead becomes `blocked`
|
||||
|
||||
The reaper flips a bead to `blocked` if:
|
||||
- `assignee = agent`, AND
|
||||
- `status = in_progress`, AND
|
||||
- `updated_at` is more than **30 minutes** in the past.
|
||||
|
||||
Every `bd note` bumps `updated_at`, so a well-behaved `beads-task-runner`
|
||||
agent never trips the reaper — it notes progress as it works. A `blocked`
|
||||
bead is a signal that:
|
||||
- the agent pod crashed mid-run (`kubectl -n claude-agent delete pod` test),
|
||||
- the job hit its 15-minute budget timeout inside `claude-agent-service`
|
||||
without notes (rare — the agent usually notes failure before exiting),
|
||||
- `claude-agent-service` was restarted during the run (in-memory job state
|
||||
is lost; see [known risks](#known-risks)).
|
||||
|
||||
Recovery: read the reaper note, reopen manually if appropriate:
|
||||
|
||||
```
|
||||
bd update <id> -s open
|
||||
bd assign <id> agent # re-arm for next dispatcher tick
|
||||
```
|
||||
|
||||
## Design choices
|
||||
|
||||
- **Sentinel assignee `agent`** — free-form, no Beads schema change. Any bd
|
||||
client can set it (`bd assign <id> agent`).
|
||||
- **Sequential dispatch** — matches `claude-agent-service`'s single-slot
|
||||
`asyncio.Lock`. With a 2-min poll cadence and ~5-min average run,
|
||||
throughput is ~12 beads/hour. Parallelism is a separate plan.
|
||||
- **Fixed agent (`beads-task-runner`)** — read-only rails, matches BeadBoard's
|
||||
manual Dispatch button. Broader-privilege agents stay manual.
|
||||
- **CronJob (not in-service polling, not n8n)** — matches existing infra
|
||||
pattern (OpenClaw task-processor, certbot-renewal, backups), TF-managed,
|
||||
easy to pause.
|
||||
- **ConfigMap-mounted `metadata.json`** — declarative TF rather than reusing
|
||||
the image-seeded file. The CronJob's init step copies it into `/tmp/.beads/`
|
||||
because `bd` may touch the parent directory and ConfigMap mounts are
|
||||
read-only.
|
||||
|
||||
## Known risks
|
||||
|
||||
- **In-memory job state in `claude-agent-service`** — if the pod restarts
|
||||
mid-run, the job record is lost. The reaper catches this after 30 min.
|
||||
Persistent job store is deferred.
|
||||
- **Prompt injection via bead fields** — a malicious bead description could
|
||||
try to steer the agent. The `beads-task-runner` rails + token budget +
|
||||
timeout are the defense. Identical exposure as the manual Dispatch button.
|
||||
- **Image tag drift** — `claude_agent_service_image_tag` in
|
||||
`stacks/beads-server/main.tf` mirrors `local.image_tag` in
|
||||
`stacks/claude-agent-service/main.tf`. Bump both when the image rebuilds,
|
||||
or the dispatcher/reaper will run on an older layer. (They only need
|
||||
`bd`, `curl`, `jq` — stable across rebuilds — so the drift is low-risk.)
|
||||
- **`bd` JSON schema changes** — the reaper's `jq` reads `.id` and
|
||||
`.updated_at`. If a future `bd` upgrade renames these, the reaper breaks
|
||||
silently (no reaping, no alert). `BD_VERSION` is pinned in the image
|
||||
Dockerfile.
|
||||
|
||||
## Verification after change
|
||||
|
||||
```
|
||||
# Both CronJobs exist with the right schedule / SUSPEND state
|
||||
kubectl -n beads-server get cronjob
|
||||
|
||||
# End-to-end smoke test
|
||||
bd create "auto-dispatch smoke test" \
|
||||
-d "Read /etc/hostname inside the agent sandbox and close." \
|
||||
--acceptance "bd note includes 'hostname=' and bead is closed."
|
||||
bd assign <new-id> agent
|
||||
# within 2 min:
|
||||
bd show <new-id> --json | jq '.notes'
|
||||
# → contains 'auto-dispatcher claimed' + 'dispatched: job=<uuid>'
|
||||
```
|
||||
|
|
@ -8,6 +8,22 @@ variable "beadboard_image_tag" {
|
|||
default = "17a38e43"
|
||||
}
|
||||
|
||||
# Mirrors `local.image_tag` in stacks/claude-agent-service/main.tf — keep in
|
||||
# sync when the claude-agent-service image is rebuilt. Reused here because the
|
||||
# dispatcher + reaper CronJobs only need bd, curl, and jq, which that image
|
||||
# already ships.
|
||||
variable "claude_agent_service_image_tag" {
|
||||
type = string
|
||||
default = "0c24c9b6"
|
||||
}
|
||||
|
||||
# Kill switch for auto-dispatch. When false, both CronJobs are suspended. The
|
||||
# manual BeadBoard Dispatch button keeps working either way.
|
||||
variable "beads_dispatcher_enabled" {
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
resource "kubernetes_namespace" "beads" {
|
||||
metadata {
|
||||
name = "beads-server"
|
||||
|
|
@ -671,3 +687,274 @@ module "beadboard_ingress" {
|
|||
"gethomepage.dev/pod-selector" = ""
|
||||
}
|
||||
}
|
||||
|
||||
# ── Beads auto-dispatch (dispatcher + reaper CronJobs) ──
|
||||
#
|
||||
# Flow:
|
||||
# user: bd assign <id> agent
|
||||
# └──> CronJob: beads-dispatcher (every 2 min)
|
||||
# 1. GET BeadBoard /api/agent-status — skip if claude-agent-service busy
|
||||
# 2. bd query 'assignee=agent AND status=open' — pick highest priority
|
||||
# 3. bd update -s in_progress (claim; next tick won't re-pick)
|
||||
# 4. POST BeadBoard /api/agent-dispatch — reuses prompt-build + bearer flow
|
||||
# 5. bd note "dispatched: job=<id>" (or rollback + note on failure)
|
||||
#
|
||||
# CronJob: beads-reaper (every 10 min)
|
||||
# └── for bead (assignee=agent, status=in_progress, updated_at > 30m):
|
||||
# bd update -s blocked + bd note (recover from pod crashes mid-run)
|
||||
#
|
||||
# The claude-agent-service image ships bd + jq + curl — no separate image built.
|
||||
|
||||
resource "kubernetes_config_map" "beads_metadata" {
|
||||
metadata {
|
||||
name = "beads-metadata"
|
||||
namespace = kubernetes_namespace.beads.metadata[0].name
|
||||
}
|
||||
data = {
|
||||
"metadata.json" = jsonencode({
|
||||
database = "dolt"
|
||||
backend = "dolt"
|
||||
dolt_mode = "server"
|
||||
dolt_server_host = "${kubernetes_service.dolt.metadata[0].name}.${kubernetes_namespace.beads.metadata[0].name}.svc.cluster.local"
|
||||
dolt_server_port = 3306
|
||||
dolt_server_user = "beads"
|
||||
dolt_database = "code"
|
||||
project_id = "a8f8bae7-ce65-4145-a5db-a13d11d297da"
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
locals {
|
||||
claude_agent_service_image = "registry.viktorbarzin.me/claude-agent-service:${var.claude_agent_service_image_tag}"
|
||||
beadboard_internal_url = "http://${kubernetes_service.beadboard.metadata[0].name}.${kubernetes_namespace.beads.metadata[0].name}.svc.cluster.local"
|
||||
|
||||
beads_script_prelude = <<-EOT
|
||||
set -euo pipefail
|
||||
# bd with Dolt server mode needs metadata.json in a directory it can walk.
|
||||
# ConfigMap mounts are read-only — copy to a writable location before use.
|
||||
mkdir -p /tmp/.beads
|
||||
cp /etc/beads-metadata/metadata.json /tmp/.beads/metadata.json
|
||||
EOT
|
||||
}
|
||||
|
||||
resource "kubernetes_cron_job_v1" "beads_dispatcher" {
|
||||
metadata {
|
||||
name = "beads-dispatcher"
|
||||
namespace = kubernetes_namespace.beads.metadata[0].name
|
||||
}
|
||||
spec {
|
||||
schedule = "*/2 * * * *"
|
||||
concurrency_policy = "Forbid"
|
||||
successful_jobs_history_limit = 3
|
||||
failed_jobs_history_limit = 3
|
||||
starting_deadline_seconds = 60
|
||||
suspend = !var.beads_dispatcher_enabled
|
||||
job_template {
|
||||
metadata {}
|
||||
spec {
|
||||
backoff_limit = 0
|
||||
ttl_seconds_after_finished = 600
|
||||
template {
|
||||
metadata {
|
||||
labels = {
|
||||
app = "beads-dispatcher"
|
||||
}
|
||||
}
|
||||
spec {
|
||||
restart_policy = "Never"
|
||||
image_pull_secrets {
|
||||
name = "registry-credentials"
|
||||
}
|
||||
container {
|
||||
name = "dispatcher"
|
||||
image = local.claude_agent_service_image
|
||||
command = ["/bin/sh", "-c", <<-EOT
|
||||
${local.beads_script_prelude}
|
||||
|
||||
BUSY=$(curl -sf "$${BEADBOARD_URL}/api/agent-status" | jq -r '.busy // false')
|
||||
if [ "$BUSY" != "false" ]; then
|
||||
echo "claude-agent-service is busy — skipping tick"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
BEAD=$(bd --db /tmp/.beads query 'assignee=agent AND status=open' --json \
|
||||
| jq -r '[.[] | select(.acceptance_criteria and (.acceptance_criteria | length) > 0)]
|
||||
| sort_by(.priority, .updated_at)[0].id // empty')
|
||||
|
||||
if [ -z "$BEAD" ]; then
|
||||
echo "no eligible beads (assignee=agent, status=open, has acceptance_criteria)"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
echo "picked bead: $BEAD"
|
||||
|
||||
bd --db /tmp/.beads update "$BEAD" -s in_progress
|
||||
bd --db /tmp/.beads note "$BEAD" "auto-dispatcher claimed at $(date -u +%Y-%m-%dT%H:%M:%SZ)"
|
||||
|
||||
RESP=$(curl -sS -w '\n%%{http_code}' -X POST \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d "{\"taskId\":\"$BEAD\"}" \
|
||||
"$${BEADBOARD_URL}/api/agent-dispatch")
|
||||
CODE=$(printf '%s' "$RESP" | tail -n1)
|
||||
BODY=$(printf '%s' "$RESP" | sed '$d')
|
||||
|
||||
if [ "$CODE" = "200" ]; then
|
||||
JOB_ID=$(printf '%s' "$BODY" | jq -r '.job_id // "unknown"')
|
||||
bd --db /tmp/.beads note "$BEAD" "dispatched: job=$JOB_ID"
|
||||
echo "dispatched $BEAD as job $JOB_ID"
|
||||
else
|
||||
# Roll the claim back so the next tick can retry.
|
||||
bd --db /tmp/.beads update "$BEAD" -s open
|
||||
bd --db /tmp/.beads note "$BEAD" "dispatch failed HTTP $CODE: $BODY"
|
||||
echo "dispatch FAILED for $BEAD: HTTP $CODE — $BODY" >&2
|
||||
exit 1
|
||||
fi
|
||||
EOT
|
||||
]
|
||||
env {
|
||||
name = "BEADBOARD_URL"
|
||||
value = local.beadboard_internal_url
|
||||
}
|
||||
env {
|
||||
name = "API_BEARER_TOKEN"
|
||||
value_from {
|
||||
secret_key_ref {
|
||||
name = "beadboard-agent-service"
|
||||
key = "api_bearer_token"
|
||||
}
|
||||
}
|
||||
}
|
||||
env {
|
||||
name = "BEADS_ACTOR"
|
||||
value = "beads-dispatcher"
|
||||
}
|
||||
env {
|
||||
name = "HOME"
|
||||
value = "/tmp"
|
||||
}
|
||||
volume_mount {
|
||||
name = "beads-metadata"
|
||||
mount_path = "/etc/beads-metadata"
|
||||
read_only = true
|
||||
}
|
||||
resources {
|
||||
requests = {
|
||||
cpu = "50m"
|
||||
memory = "128Mi"
|
||||
}
|
||||
limits = {
|
||||
memory = "256Mi"
|
||||
}
|
||||
}
|
||||
}
|
||||
volume {
|
||||
name = "beads-metadata"
|
||||
config_map {
|
||||
name = kubernetes_config_map.beads_metadata.metadata[0].name
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
lifecycle {
|
||||
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
|
||||
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
|
||||
}
|
||||
}
|
||||
|
||||
resource "kubernetes_cron_job_v1" "beads_reaper" {
|
||||
metadata {
|
||||
name = "beads-reaper"
|
||||
namespace = kubernetes_namespace.beads.metadata[0].name
|
||||
}
|
||||
spec {
|
||||
schedule = "*/10 * * * *"
|
||||
concurrency_policy = "Forbid"
|
||||
successful_jobs_history_limit = 3
|
||||
failed_jobs_history_limit = 3
|
||||
starting_deadline_seconds = 60
|
||||
suspend = !var.beads_dispatcher_enabled
|
||||
job_template {
|
||||
metadata {}
|
||||
spec {
|
||||
backoff_limit = 0
|
||||
ttl_seconds_after_finished = 600
|
||||
template {
|
||||
metadata {
|
||||
labels = {
|
||||
app = "beads-reaper"
|
||||
}
|
||||
}
|
||||
spec {
|
||||
restart_policy = "Never"
|
||||
image_pull_secrets {
|
||||
name = "registry-credentials"
|
||||
}
|
||||
container {
|
||||
name = "reaper"
|
||||
image = local.claude_agent_service_image
|
||||
command = ["/bin/sh", "-c", <<-EOT
|
||||
${local.beads_script_prelude}
|
||||
|
||||
THRESHOLD_MIN=30
|
||||
NOW=$(date -u +%s)
|
||||
|
||||
bd --db /tmp/.beads query 'assignee=agent AND status=in_progress' --json \
|
||||
| jq -c '.[]' \
|
||||
| while read -r BEAD_JSON; do
|
||||
ID=$(printf '%s' "$BEAD_JSON" | jq -r '.id')
|
||||
LAST_UPDATE=$(printf '%s' "$BEAD_JSON" | jq -r '.updated_at')
|
||||
# Alpine's busybox date lacks GNU -d; parse ISO-8601 with python3.
|
||||
LAST_TS=$(python3 -c "from datetime import datetime; print(int(datetime.fromisoformat('$LAST_UPDATE'.replace('Z','+00:00')).timestamp()))")
|
||||
AGE_MIN=$(( (NOW - LAST_TS) / 60 ))
|
||||
if [ "$AGE_MIN" -gt "$THRESHOLD_MIN" ]; then
|
||||
bd --db /tmp/.beads note "$ID" "reaper: no progress for $${AGE_MIN}m (threshold $${THRESHOLD_MIN}m) — blocking"
|
||||
bd --db /tmp/.beads update "$ID" -s blocked
|
||||
echo "REAPED $ID (stale $${AGE_MIN}m)"
|
||||
else
|
||||
echo "keeping $ID (age $${AGE_MIN}m < $${THRESHOLD_MIN}m)"
|
||||
fi
|
||||
done
|
||||
EOT
|
||||
]
|
||||
env {
|
||||
name = "BEADS_ACTOR"
|
||||
value = "beads-reaper"
|
||||
}
|
||||
env {
|
||||
name = "HOME"
|
||||
value = "/tmp"
|
||||
}
|
||||
volume_mount {
|
||||
name = "beads-metadata"
|
||||
mount_path = "/etc/beads-metadata"
|
||||
read_only = true
|
||||
}
|
||||
resources {
|
||||
requests = {
|
||||
cpu = "50m"
|
||||
memory = "128Mi"
|
||||
}
|
||||
limits = {
|
||||
memory = "256Mi"
|
||||
}
|
||||
}
|
||||
}
|
||||
volume {
|
||||
name = "beads-metadata"
|
||||
config_map {
|
||||
name = kubernetes_config_map.beads_metadata.metadata[0].name
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
lifecycle {
|
||||
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
|
||||
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
|
||||
}
|
||||
}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue