Viktor Barzin dd1ec4f1be docs/plans: add agent presence implementation plan (2026-05-17)

15-task plan for a shared presence board so Claude Code sessions can
see which shared infra resources are being actively mutated by other
sessions. Resource-scoped claims on the existing Dolt server,
heartbeat-driven TTL, agent-driven via CLAUDE.md rule + Python CLI.

2026-05-17 21:03:17 +00:00

46 KiB

Raw Blame History

Agent Presence Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Build a shared presence board so Claude Code agent sessions can see which shared infra resources are being actively mutated by other sessions, preventing redundant investigations and overlapping operations.

Architecture: Single-table store on the existing Dolt server (10.0.20.200:3306, beads DB, new presence_claims table). Python single-file CLI (scripts/presence) writes/reads claims. Heartbeat-driven TTL — entries expire 15 min after the last heartbeat, so "left unclosed" is structurally impossible. A consolidated UserPromptSubmit hook injects other sessions' active claims into every turn for ambient awareness. CLAUDE.md rule mandates agents claim before mutating shared state.

Tech Stack: Python 3 stdlib + pymysql; Dolt (MySQL-compatible) at 10.0.20.200:3306; Bash hooks; Terraform Kubernetes provider.

Coverage of design decisions (locked in grilling):

Pure presence/coordination — not work tracking
Resource-scoped entries (<type>:<name>)
Heartbeat TTL + Stop-hook release
Agent-driven claim via CLI invoked from agent reasoning per CLAUDE.md rule
Stored on Dolt beads DB, new table
CLI verbs: claim, heartbeat, release, list, peek
UserPromptSubmit hook consolidates beads + presence
Seed vocab: node:, host:, stack:, service:, db:, pvc:, infra:
Only mutating ops trigger claim
Co-claim allowed; soft-defer protocol on conflict
MVP devvm only (no claude-agent-service / Woodpecker)
Beads coexists with cleaned semantics
Pure rule + visibility for enforcement (measure first)
Python single-file CLI at ~/code/scripts/presence

File Structure

New files:

scripts/presence — Python single-file CLI (~250 lines)
scripts/tests/test_presence.py — pytest unit tests for the CLI
scripts/tests/conftest.py — pytest fixtures (mocked DB)
.claude/hooks/presence-session-start.sh — generates session ID at start
.claude/hooks/presence-heartbeat.sh — throttled heartbeat on PostToolUse
.claude/hooks/presence-release.sh — release on Stop
.claude/hooks/agent-state-context.sh — consolidated beads+presence injector (replaces user-global beads-task-context.sh)

Modified files:

infra/stacks/beads-server/main.tf — add presence_claims schema init
.claude/settings.json — wire new hooks; swap UserPromptSubmit to consolidated script
CLAUDE.md — add the claim-before-mutate rule, seed vocab, defer protocol

Touched-but-untouched (audit only):

Stale in_progress beads items (close or revert to open)

Task 1: Create `presence_claims` table on the Dolt server

Files:

Modify: infra/stacks/beads-server/main.tf — extend the existing kubernetes_config_map.dolt_init data block + add a kubernetes_job for idempotent table creation on already-running Dolt
Apply via scripts/tg apply from infra/stacks/beads-server/

The dolt_init ConfigMap only runs on fresh Dolt PVCs. Since Dolt is already running with the existing PV, the new SQL won't fire from there. The Job is the workaround for live updates and stays idempotent forever.

Step 1: Add the schema SQL into the existing dolt_init ConfigMap

In infra/stacks/beads-server/main.tf, locate resource "kubernetes_config_map" "dolt_init" and add a second data entry:

resource "kubernetes_config_map" "dolt_init" {
  metadata {
    name      = "dolt-init"
    namespace = kubernetes_namespace.beads.metadata[0].name
  }
  data = {
    "01-create-beads-user.sql" = <<-EOT
      CREATE USER IF NOT EXISTS 'beads'@'%' IDENTIFIED BY '';
      GRANT ALL PRIVILEGES ON *.* TO 'beads'@'%' WITH GRANT OPTION;
    EOT
    "02-create-presence-table.sql" = <<-EOT
      CREATE DATABASE IF NOT EXISTS beads;
      USE beads;
      CREATE TABLE IF NOT EXISTS presence_claims (
        session_id      VARCHAR(128)  NOT NULL,
        resource_label  VARCHAR(255)  NOT NULL,
        purpose         TEXT          NOT NULL,
        claimed_at      DATETIME(3)   NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
        expires_at      DATETIME(3)   NOT NULL,
        host            VARCHAR(128)  NOT NULL,
        user            VARCHAR(64)   NOT NULL,
        agent_name      VARCHAR(64)   DEFAULT 'claude-code',
        PRIMARY KEY (session_id, resource_label),
        INDEX idx_resource (resource_label),
        INDEX idx_expires  (expires_at)
      );
    EOT
  }
}

Step 2: Add an idempotent migration Job that creates the table on the running Dolt

Append a new resource block in infra/stacks/beads-server/main.tf, after the kubernetes_deployment.dolt resource:

resource "kubernetes_job" "presence_schema_migrate" {
  metadata {
    # name includes a hash of the SQL so a real schema change forces a new Job
    name      = "presence-schema-${substr(sha256(kubernetes_config_map.dolt_init.data["02-create-presence-table.sql"]), 0, 8)}"
    namespace = kubernetes_namespace.beads.metadata[0].name
  }
  spec {
    backoff_limit = 3
    template {
      metadata {}
      spec {
        restart_policy = "OnFailure"
        container {
          name    = "migrate"
          image   = "mysql:8.4"
          command = ["sh", "-c"]
          args = [
            "mysql -h dolt.beads-server.svc.cluster.local -P 3306 -u root < /sql/02-create-presence-table.sql"
          ]
          volume_mount {
            name       = "sql"
            mount_path = "/sql"
          }
        }
        volume {
          name = "sql"
          config_map {
            name = kubernetes_config_map.dolt_init.metadata[0].name
          }
        }
      }
    }
  }
  wait_for_completion = true
  timeouts {
    create = "5m"
  }
  depends_on = [kubernetes_deployment.dolt]
}

Step 3: Apply the Terraform change

Run:

cd /home/wizard/code/infra/stacks/beads-server
../../scripts/tg apply

Expected: kubernetes_config_map.dolt_init updated + kubernetes_job.presence_schema_migrate created + Job completes successfully.

Step 4: Verify the table exists

Run:

mysql -h 10.0.20.200 -u beads -e "USE beads; SHOW TABLES LIKE 'presence_claims'; DESCRIBE presence_claims;"

Expected: one row presence_claims from SHOW TABLES; DESCRIBE shows the 8 columns with the right types.

Step 5: Commit

git add infra/stacks/beads-server/main.tf
git commit -m "beads-server: add presence_claims table for agent coordination

Adds the schema for the new agent presence board. Live Dolt is updated
via a hashed-named one-shot Job; the ConfigMap entry preserves fresh-PVC
init.
"

Task 2: Python CLI scaffolding (argparse + DB connection)

Files:

Create: scripts/presence
Create: scripts/tests/test_presence.py
Create: scripts/tests/conftest.py
Step 1: Write the failing test for --help

Create scripts/tests/test_presence.py:

import subprocess
from pathlib import Path

SCRIPT = Path(__file__).parent.parent / "presence"


def test_help_lists_subcommands():
    """--help should list all supported subcommands."""
    result = subprocess.run(
        [str(SCRIPT), "--help"], capture_output=True, text=True
    )
    assert result.returncode == 0
    for verb in ("claim", "heartbeat", "release", "list", "peek"):
        assert verb in result.stdout

Step 2: Run the test, confirm it fails

Run: pytest scripts/tests/test_presence.py::test_help_lists_subcommands -v Expected: FAIL — scripts/presence doesn't exist yet (FileNotFoundError).

Step 3: Create the CLI skeleton

Create scripts/presence:

#!/usr/bin/env python3
"""Agent presence board CLI.

Lets Claude Code agent sessions claim, heartbeat, release, list, and peek at
shared infra resource claims so that two sessions don't unknowingly mutate
the same thing at the same time.

Reads connection details from env:
  PRESENCE_DSN          mysql DSN (default: beads@10.0.20.200:3306/beads)
  CLAUDE_SESSION_ID     session identity (default: read from session-id file)
"""

from __future__ import annotations

import argparse
import getpass
import json
import os
import socket
import sys
import uuid
from pathlib import Path

SESSION_ID_FILE = Path.home() / ".cache" / "claude-presence" / "current.session"
DEFAULT_DSN = "mysql://beads@10.0.20.200:3306/beads"
DEFAULT_TTL_SECONDS = 15 * 60


def get_session_id() -> str:
    """Return the current session ID, generating a fallback if missing."""
    env = os.environ.get("CLAUDE_SESSION_ID")
    if env:
        return env
    if SESSION_ID_FILE.exists():
        return SESSION_ID_FILE.read_text().strip()
    # Fallback: ephemeral one-shot id (won't be cleaned up by Stop hook)
    return f"{getpass.getuser()}@{socket.gethostname().split('.')[0]}@{uuid.uuid4().hex[:8]}"


def build_parser() -> argparse.ArgumentParser:
    p = argparse.ArgumentParser(
        prog="presence",
        description="Agent presence board for coordinating shared-infra mutations.",
    )
    p.add_argument("--json", action="store_true", help="emit machine-readable output")
    sub = p.add_subparsers(dest="verb", required=True)

    c = sub.add_parser("claim", help="claim a resource you're about to mutate")
    c.add_argument("label", help="resource label, e.g. node:k8s-node1")
    c.add_argument("--purpose", required=True, help="what + why")
    c.add_argument("--ttl", type=int, default=DEFAULT_TTL_SECONDS, help="seconds")

    sub.add_parser("heartbeat", help="extend TTL on all my active claims")

    r = sub.add_parser("release", help="release one or all of my claims")
    r.add_argument("label", nargs="?", help="resource label; omit with --all-mine")
    r.add_argument("--all-mine", action="store_true")

    li = sub.add_parser("list", help="show active claims")
    g = li.add_mutually_exclusive_group()
    g.add_argument("--mine", action="store_true")
    g.add_argument("--all", action="store_true", default=True)

    pe = sub.add_parser("peek", help="show all active claims on a resource")
    pe.add_argument("label", help="resource label")

    return p


def main(argv: list[str] | None = None) -> int:
    parser = build_parser()
    args = parser.parse_args(argv)
    # Verbs implemented in later tasks; stub for now so --help works.
    print(f"verb={args.verb} not yet implemented", file=sys.stderr)
    return 0


if __name__ == "__main__":
    sys.exit(main())

Step 4: Make it executable

Run: chmod +x /home/wizard/code/scripts/presence

Step 5: Re-run the test, confirm it passes

Run: pytest scripts/tests/test_presence.py::test_help_lists_subcommands -v Expected: PASS.

Step 6: Commit

git add scripts/presence scripts/tests/test_presence.py
git commit -m "presence: add CLI scaffolding with argparse subcommands"

Task 3: `claim` verb — write to DB, return conflicts

Files:

Modify: scripts/presence
Modify: scripts/tests/test_presence.py
Create: scripts/tests/conftest.py
Step 1: Add pymysql + fixture scaffolding in conftest

Create scripts/tests/conftest.py:

import os
from unittest.mock import MagicMock

import pytest


@pytest.fixture
def fake_db(monkeypatch):
    """Mocks pymysql.connect to return a MagicMock cursor we can inspect."""
    conn = MagicMock(name="conn")
    cur = MagicMock(name="cur")
    conn.cursor.return_value.__enter__.return_value = cur
    cur.fetchall.return_value = []

    import pymysql
    monkeypatch.setattr(pymysql, "connect", MagicMock(return_value=conn))
    monkeypatch.setenv("CLAUDE_SESSION_ID", "wizard@devvm@testtest")
    return cur

Step 2: Write the failing test for claim happy path

Append to scripts/tests/test_presence.py:

import importlib.util
import sys
from pathlib import Path


def _load_module():
    spec = importlib.util.spec_from_file_location("presence", SCRIPT)
    mod = importlib.util.module_from_spec(spec)
    sys.modules["presence"] = mod
    spec.loader.exec_module(mod)
    return mod


def test_claim_inserts_row(fake_db):
    presence = _load_module()
    rc = presence.main(["claim", "node:k8s-node1", "--purpose", "GPU upgrade"])
    assert rc == 0
    # First call: insert/upsert; second: read existing other-session claims
    sql_calls = [c.args[0] for c in fake_db.execute.call_args_list]
    assert any("INSERT" in s.upper() or "REPLACE" in s.upper() for s in sql_calls)
    assert any("SELECT" in s.upper() for s in sql_calls)


def test_claim_reports_other_session_conflict(fake_db, capsys):
    presence = _load_module()
    # Simulate one OTHER session already holding the label
    fake_db.fetchall.return_value = [
        {
            "session_id": "emo@laptop@aaaaaaaa",
            "purpose": "tcpdump on uplink",
            "claimed_at": "2026-05-17 14:10:00.000",
            "user": "emo",
            "host": "laptop",
        }
    ]
    rc = presence.main(["claim", "node:k8s-node1", "--purpose", "GPU upgrade"])
    out = capsys.readouterr().out
    assert rc == 0
    assert "emo@laptop@aaaaaaaa" in out
    assert "tcpdump on uplink" in out

Step 3: Run the tests, confirm they fail

Run: pytest scripts/tests/test_presence.py -v -k claim Expected: 2 failures — claim verb not implemented (stub prints "not yet implemented").

Step 4: Implement claim in scripts/presence

Replace the bottom of scripts/presence (the stub main) with this. Also add the DB helpers and _claim function above main:

import urllib.parse

try:
    import pymysql
    import pymysql.cursors
except ImportError:
    pymysql = None  # graceful: handled in _connect


def _connect():
    if pymysql is None:
        return None
    dsn = os.environ.get("PRESENCE_DSN", DEFAULT_DSN)
    u = urllib.parse.urlparse(dsn)
    try:
        return pymysql.connect(
            host=u.hostname,
            port=u.port or 3306,
            user=u.username or "beads",
            password=u.password or "",
            database=(u.path.lstrip("/") or "beads"),
            cursorclass=pymysql.cursors.DictCursor,
            connect_timeout=3,
            autocommit=True,
        )
    except Exception as e:
        print(f"presence: warning: dolt unreachable ({e}); continuing", file=sys.stderr)
        return None


def _claim(args, session_id: str) -> int:
    conn = _connect()
    if conn is None:
        return 0  # graceful degradation
    with conn.cursor() as cur:
        cur.execute(
            """
            REPLACE INTO presence_claims
                (session_id, resource_label, purpose, claimed_at, expires_at, host, user, agent_name)
            VALUES
                (%s, %s, %s, NOW(3), NOW(3) + INTERVAL %s SECOND, %s, %s, %s)
            """,
            (
                session_id,
                args.label,
                args.purpose,
                args.ttl,
                socket.gethostname().split(".")[0],
                getpass.getuser(),
                "claude-code",
            ),
        )
        cur.execute(
            """
            SELECT session_id, purpose, claimed_at, user, host
              FROM presence_claims
             WHERE resource_label = %s
               AND session_id    != %s
               AND expires_at    >  NOW(3)
             ORDER BY claimed_at
            """,
            (args.label, session_id),
        )
        conflicts = cur.fetchall()
    if not conflicts:
        print(f"presence: claimed {args.label}")
        return 0
    print(f"presence: claimed {args.label} -- ALSO CLAIMED BY:")
    for c in conflicts:
        print(f"  - {c['session_id']} ({c['user']}@{c['host']}): {c['purpose']} since {c['claimed_at']}")
    print("presence: per CLAUDE.md rule, default is to DEFER — release your claim and confirm with the user.")
    return 0

Update main to dispatch:

def main(argv: list[str] | None = None) -> int:
    parser = build_parser()
    args = parser.parse_args(argv)
    session_id = get_session_id()
    if args.verb == "claim":
        return _claim(args, session_id)
    print(f"verb={args.verb} not yet implemented", file=sys.stderr)
    return 0

Step 5: Run tests, confirm they pass

Run: pytest scripts/tests/test_presence.py -v -k claim Expected: both test_claim_inserts_row and test_claim_reports_other_session_conflict PASS.

Step 6: Commit

git add scripts/presence scripts/tests/test_presence.py scripts/tests/conftest.py
git commit -m "presence: implement claim verb (upsert + conflict report)"

Task 4: `peek` and `list` verbs (read paths)

Files:

Modify: scripts/presence
Modify: scripts/tests/test_presence.py
Step 1: Write the failing tests for peek and list

Append to scripts/tests/test_presence.py:

def test_peek_shows_all_active_claims_for_resource(fake_db, capsys):
    presence = _load_module()
    fake_db.fetchall.return_value = [
        {
            "session_id": "wizard@devvm@bbbbbbbb",
            "purpose": "GPU driver upgrade",
            "claimed_at": "2026-05-17 14:32:00.000",
            "expires_at": "2026-05-17 14:47:00.000",
            "user": "wizard",
            "host": "devvm",
        }
    ]
    rc = presence.main(["peek", "node:k8s-node1"])
    out = capsys.readouterr().out
    assert rc == 0
    assert "wizard@devvm@bbbbbbbb" in out
    assert "GPU driver upgrade" in out


def test_peek_empty_resource_prints_no_active_claim(fake_db, capsys):
    presence = _load_module()
    fake_db.fetchall.return_value = []
    rc = presence.main(["peek", "node:k8s-node99"])
    out = capsys.readouterr().out
    assert rc == 0
    assert "no active claim" in out.lower()


def test_list_all_shows_only_active(fake_db, capsys):
    presence = _load_module()
    fake_db.fetchall.return_value = [
        {
            "session_id": "wizard@devvm@xxxxxxxx",
            "resource_label": "stack:gpu-operator",
            "purpose": "rebuild driver",
            "claimed_at": "2026-05-17 14:00:00.000",
            "expires_at": "2026-05-17 14:15:00.000",
            "user": "wizard",
            "host": "devvm",
        }
    ]
    rc = presence.main(["list", "--all"])
    out = capsys.readouterr().out
    assert rc == 0
    assert "stack:gpu-operator" in out
    assert "wizard@devvm@xxxxxxxx" in out


def test_list_mine_filters_to_current_session(fake_db, monkeypatch):
    presence = _load_module()
    presence.main(["list", "--mine"])
    sql = fake_db.execute.call_args_list[-1].args[0]
    assert "session_id" in sql
    assert "expires_at" in sql

Step 2: Run the tests, confirm they fail

Run: pytest scripts/tests/test_presence.py -v -k "peek or list" Expected: 4 failures — verbs unimplemented.

Step 3: Implement peek and list

Add to scripts/presence, above main:

def _peek(args, session_id: str) -> int:
    conn = _connect()
    if conn is None:
        return 0
    with conn.cursor() as cur:
        cur.execute(
            """
            SELECT session_id, purpose, claimed_at, expires_at, user, host
              FROM presence_claims
             WHERE resource_label = %s
               AND expires_at    >  NOW(3)
             ORDER BY claimed_at
            """,
            (args.label,),
        )
        rows = cur.fetchall()
    if not rows:
        print(f"presence: no active claim on {args.label}")
        return 0
    print(f"presence: active claims on {args.label}:")
    for r in rows:
        marker = " (me)" if r["session_id"] == session_id else ""
        print(f"  - {r['session_id']}{marker} ({r['user']}@{r['host']}): {r['purpose']} since {r['claimed_at']} (expires {r['expires_at']})")
    return 0


def _list(args, session_id: str) -> int:
    conn = _connect()
    if conn is None:
        return 0
    query = """
        SELECT session_id, resource_label, purpose, claimed_at, expires_at, user, host
          FROM presence_claims
         WHERE expires_at > NOW(3)
    """
    params: tuple = ()
    if args.mine:
        query += " AND session_id = %s"
        params = (session_id,)
    query += " ORDER BY claimed_at"
    with conn.cursor() as cur:
        cur.execute(query, params)
        rows = cur.fetchall()
    if not rows:
        print("presence: no active claims")
        return 0
    for r in rows:
        marker = " (me)" if r["session_id"] == session_id else ""
        print(f"  {r['resource_label']:<32} {r['session_id']}{marker} -- {r['purpose']} ({r['claimed_at']})")
    return 0

Extend the dispatcher in main:

    if args.verb == "claim":
        return _claim(args, session_id)
    if args.verb == "peek":
        return _peek(args, session_id)
    if args.verb == "list":
        return _list(args, session_id)

Step 4: Run tests, confirm they pass

Run: pytest scripts/tests/test_presence.py -v -k "peek or list" Expected: 4 PASSES.

Step 5: Commit

git add scripts/presence scripts/tests/test_presence.py
git commit -m "presence: implement peek + list verbs"

Task 5: `heartbeat` and `release` verbs

Files:

Modify: scripts/presence
Modify: scripts/tests/test_presence.py
Step 1: Write the failing tests

Append to scripts/tests/test_presence.py:

def test_heartbeat_extends_all_my_claims(fake_db):
    presence = _load_module()
    rc = presence.main(["heartbeat"])
    assert rc == 0
    sql = fake_db.execute.call_args_list[-1].args[0]
    assert "UPDATE" in sql.upper()
    assert "expires_at" in sql
    assert "session_id" in sql


def test_release_single_label(fake_db):
    presence = _load_module()
    rc = presence.main(["release", "node:k8s-node1"])
    assert rc == 0
    last = fake_db.execute.call_args_list[-1]
    assert "DELETE" in last.args[0].upper()
    assert "node:k8s-node1" in last.args[1]


def test_release_all_mine(fake_db):
    presence = _load_module()
    rc = presence.main(["release", "--all-mine"])
    assert rc == 0
    last = fake_db.execute.call_args_list[-1]
    assert "DELETE" in last.args[0].upper()
    assert "wizard@devvm@testtest" in last.args[1]

Step 2: Run tests, confirm they fail

Run: pytest scripts/tests/test_presence.py -v -k "heartbeat or release" Expected: 3 failures.

Step 3: Implement heartbeat and release

Add to scripts/presence:

def _heartbeat(args, session_id: str) -> int:
    conn = _connect()
    if conn is None:
        return 0
    with conn.cursor() as cur:
        cur.execute(
            """
            UPDATE presence_claims
               SET expires_at = NOW(3) + INTERVAL %s SECOND
             WHERE session_id = %s
               AND expires_at > NOW(3)
            """,
            (DEFAULT_TTL_SECONDS, session_id),
        )
    return 0


def _release(args, session_id: str) -> int:
    conn = _connect()
    if conn is None:
        return 0
    with conn.cursor() as cur:
        if args.all_mine:
            cur.execute("DELETE FROM presence_claims WHERE session_id = %s", (session_id,))
        else:
            if not args.label:
                print("presence: release requires <label> or --all-mine", file=sys.stderr)
                return 2
            cur.execute(
                "DELETE FROM presence_claims WHERE session_id = %s AND resource_label = %s",
                (session_id, args.label),
            )
    return 0

Extend dispatcher:

    if args.verb == "heartbeat":
        return _heartbeat(args, session_id)
    if args.verb == "release":
        return _release(args, session_id)

Step 4: Run tests, confirm they pass

Run: pytest scripts/tests/test_presence.py -v -k "heartbeat or release" Expected: 3 PASSES.

Step 5: Commit

git add scripts/presence scripts/tests/test_presence.py
git commit -m "presence: implement heartbeat + release verbs"

Task 6: `--json` output mode (used by the hook)

Files:

Modify: scripts/presence
Modify: scripts/tests/test_presence.py

The UserPromptSubmit hook needs structured output to render the "currently in flight" section. --json covers list (the most common machine-consumed verb).

Step 1: Write the failing test

Append:

def test_list_json_emits_array(fake_db, capsys):
    presence = _load_module()
    fake_db.fetchall.return_value = [
        {
            "session_id": "wizard@devvm@yyyyyyyy",
            "resource_label": "stack:gpu-operator",
            "purpose": "rebuild driver",
            "claimed_at": "2026-05-17 14:00:00.000",
            "expires_at": "2026-05-17 14:15:00.000",
            "user": "wizard",
            "host": "devvm",
        }
    ]
    rc = presence.main(["--json", "list", "--all"])
    out = capsys.readouterr().out
    assert rc == 0
    payload = json.loads(out)
    assert isinstance(payload, list)
    assert payload[0]["resource_label"] == "stack:gpu-operator"

Add import json at top of test_presence.py if not already there.

Step 2: Run, confirm it fails

Run: pytest scripts/tests/test_presence.py::test_list_json_emits_array -v Expected: FAIL (--json not consumed; output not JSON).

Step 3: Implement JSON output

In _list, branch on getattr(args, "json", False). The --json flag is on the top-level parser already; we need to forward it. The cleanest path is to attach it onto the namespace post-parse:

In main, right after args = parser.parse_args(argv):

    # propagate top-level --json onto args.json (no-op if already there)
    args.json = bool(getattr(args, "json", False))

In _list, replace the printing branch with:

    if args.json:
        out = []
        for r in rows:
            out.append({
                "session_id":      r["session_id"],
                "resource_label":  r["resource_label"],
                "purpose":         r["purpose"],
                "claimed_at":      str(r["claimed_at"]),
                "expires_at":      str(r["expires_at"]),
                "user":            r["user"],
                "host":            r["host"],
                "is_me":           r["session_id"] == session_id,
            })
        print(json.dumps(out))
        return 0
    if not rows:
        print("presence: no active claims")
        return 0
    for r in rows:
        marker = " (me)" if r["session_id"] == session_id else ""
        print(f"  {r['resource_label']:<32} {r['session_id']}{marker} -- {r['purpose']} ({r['claimed_at']})")
    return 0

Step 4: Run, confirm it passes

Run: pytest scripts/tests/test_presence.py::test_list_json_emits_array -v Expected: PASS.

Step 5: Commit

git add scripts/presence scripts/tests/test_presence.py
git commit -m "presence: add --json output for list verb (hook consumption)"

Task 7: Verify graceful degradation when Dolt is unreachable

Files:

Modify: scripts/tests/test_presence.py

The CLI MUST exit 0 with a stderr warning when the DB is down. We've coded this; now lock it in with a test.

Step 1: Write the failing test

Append:

def test_claim_returns_zero_when_db_unreachable(monkeypatch, capsys):
    monkeypatch.setenv("CLAUDE_SESSION_ID", "wizard@devvm@offlineee")
    monkeypatch.setenv("PRESENCE_DSN", "mysql://beads@127.0.0.1:1/beads")
    presence = _load_module()
    rc = presence.main(["claim", "node:nowhere", "--purpose", "test"])
    err = capsys.readouterr().err
    assert rc == 0
    assert "dolt unreachable" in err

Step 2: Run, confirm it passes

Run: pytest scripts/tests/test_presence.py::test_claim_returns_zero_when_db_unreachable -v Expected: PASS (already implemented; this just locks it in).

Step 3: Run the full test suite to confirm nothing regressed

Run: pytest scripts/tests/test_presence.py -v Expected: all tests PASS.

Step 4: Commit

git add scripts/tests/test_presence.py
git commit -m "presence: lock in graceful degradation when Dolt unreachable"

Task 8: SessionStart hook — write the session ID

Files:

Create: .claude/hooks/presence-session-start.sh
Step 1: Write the script

Create /home/wizard/code/.claude/hooks/presence-session-start.sh:

#!/bin/bash
# SessionStart hook: generate a stable session ID and write it where the
# other hooks + the presence CLI can find it.
#
# Single-session-per-user limitation: last-started session wins. Override
# with `export CLAUDE_SESSION_ID=...` per shell if you need parallel sessions.

set -euo pipefail

DIR="$HOME/.cache/claude-presence"
mkdir -p "$DIR"

USER_SHORT="$(whoami)"
HOST_SHORT="$(hostname -s)"
RAND="$(od -An -N4 -tx1 /dev/urandom | tr -d ' \n')"
SESSION_ID="${USER_SHORT}@${HOST_SHORT}@${RAND}"

echo -n "$SESSION_ID" > "$DIR/current.session"
exit 0

Make executable:

chmod +x /home/wizard/code/.claude/hooks/presence-session-start.sh

Step 2: Smoke-test

Run:

/home/wizard/code/.claude/hooks/presence-session-start.sh
cat ~/.cache/claude-presence/current.session

Expected: a line like wizard@devvm@a1b2c3d4.

Step 3: Commit

git add .claude/hooks/presence-session-start.sh
git commit -m "presence: SessionStart hook generates session id"

Task 9: PostToolUse heartbeat hook (throttled)

Files:

Create: .claude/hooks/presence-heartbeat.sh

Heartbeat on every tool call would spam the DB. Throttle to at most once every 120 seconds per session.

Step 1: Write the script

Create /home/wizard/code/.claude/hooks/presence-heartbeat.sh:

#!/bin/bash
# PostToolUse hook: throttled heartbeat. Extends TTL on this session's
# active claims so they don't expire while we're working.
set -euo pipefail

SESSION_FILE="$HOME/.cache/claude-presence/current.session"
[[ -r "$SESSION_FILE" ]] || exit 0

SESSION_ID="$(cat "$SESSION_FILE")"
STAMP="/tmp/claude-presence-last-hb-${SESSION_ID}"

NOW=$(date +%s)
LAST=$(stat -c %Y "$STAMP" 2>/dev/null || echo 0)
if (( NOW - LAST < 120 )); then
    exit 0
fi
touch "$STAMP"

CLAUDE_SESSION_ID="$SESSION_ID" \
    /home/wizard/code/scripts/presence heartbeat >/dev/null 2>&1 || true
exit 0

Make executable:

chmod +x /home/wizard/code/.claude/hooks/presence-heartbeat.sh

Step 2: Smoke-test by setting an old timestamp and running twice

Run:

rm -f /tmp/claude-presence-last-hb-*
/home/wizard/code/.claude/hooks/presence-heartbeat.sh
# Second call within 120s should silently no-op
/home/wizard/code/.claude/hooks/presence-heartbeat.sh
echo "OK"

Expected: both invocations exit 0, only the first one would have hit Dolt (you can verify by querying presence_claims.expires_at if a claim exists).

Step 3: Commit

git add .claude/hooks/presence-heartbeat.sh
git commit -m "presence: PostToolUse heartbeat hook (throttled to 1/2min)"

Task 10: Stop hook — release all of my claims

Files:

Create: .claude/hooks/presence-release.sh
Step 1: Write the script

Create /home/wizard/code/.claude/hooks/presence-release.sh:

#!/bin/bash
# Stop hook: release all active claims for this session and clean up
# session-id cache. Best-effort — never blocks shutdown.
set -euo pipefail

SESSION_FILE="$HOME/.cache/claude-presence/current.session"
[[ -r "$SESSION_FILE" ]] || exit 0

SESSION_ID="$(cat "$SESSION_FILE")"

CLAUDE_SESSION_ID="$SESSION_ID" \
    /home/wizard/code/scripts/presence release --all-mine >/dev/null 2>&1 || true

# clean the per-session heartbeat-stamp file
rm -f "/tmp/claude-presence-last-hb-${SESSION_ID}"
exit 0

Make executable:

chmod +x /home/wizard/code/.claude/hooks/presence-release.sh

Step 2: Smoke-test

Run:

/home/wizard/code/.claude/hooks/presence-release.sh
echo "exit=$?"

Expected: prints exit=0.

Step 3: Commit

git add .claude/hooks/presence-release.sh
git commit -m "presence: Stop hook releases all session claims"

Task 11: Consolidate beads + presence into one UserPromptSubmit hook

Files:

Create: .claude/hooks/agent-state-context.sh

Replaces the user-global ~/.claude/hooks/beads-task-context.sh. The new hook emits one "Agent state context" block with two sections.

Step 1: Write the consolidated hook

Create /home/wizard/code/.claude/hooks/agent-state-context.sh:

#!/bin/bash
# UserPromptSubmit hook: inject (1) beads in-progress + open tasks, and
# (2) other sessions' active presence claims. One consolidated block.

set -euo pipefail

BD="/home/wizard/.local/bin/bd"
BEADS_DIR="/home/wizard/code/.beads"
PRESENCE="/home/wizard/code/scripts/presence"
SESSION_FILE="$HOME/.cache/claude-presence/current.session"

CTX=""

# Section 1: beads
if [[ -x "$BD" && -d "$BEADS_DIR" ]]; then
    IN_PROG="$("$BD" --db "$BEADS_DIR" list --status in_progress 2>/dev/null || true)"
    OPEN="$("$BD" --db "$BEADS_DIR" list --status open 2>/dev/null | head -10 || true)"
    if [[ -n "$IN_PROG$OPEN" ]]; then
        CTX+="Active beads tasks (from shared Dolt server at 10.0.20.200):
"
        [[ -n "$IN_PROG" ]] && CTX+="
IN PROGRESS:
$IN_PROG
"
        [[ -n "$OPEN" ]] && CTX+="
OPEN (available):
$OPEN
"
        CTX+="
Use beads for tracked TODO work (multi-step epics, things to remember,
work that blocks other tasks). Skip beads for small in-session changes.
"
    fi
fi

# Section 2: presence — only other sessions' active claims
if [[ -x "$PRESENCE" && -r "$SESSION_FILE" ]]; then
    SESSION_ID="$(cat "$SESSION_FILE")"
    OTHERS="$(CLAUDE_SESSION_ID="$SESSION_ID" "$PRESENCE" --json list --all 2>/dev/null \
                | python3 -c "
import json, sys
try:
    rows = json.load(sys.stdin)
except Exception:
    sys.exit(0)
others = [r for r in rows if not r.get('is_me')]
if not others:
    sys.exit(0)
print('Currently being mutated by OTHER sessions:')
for r in others:
    print(f'  {r[\"resource_label\"]:<28} {r[\"session_id\"]} -- {r[\"purpose\"]} (since {r[\"claimed_at\"]})')
" 2>/dev/null || true)"
    if [[ -n "$OTHERS" ]]; then
        CTX+="
$OTHERS

If any of these resources overlap with what you're about to mutate,
DEFER and surface to the user. See CLAUDE.md \"Agent presence\" rule.
"
    fi
fi

[[ -z "$CTX" ]] && exit 0

python3 -c "
import json, sys
ctx = sys.stdin.read()
print(json.dumps({
    'hookSpecificOutput': {
        'hookEventName': 'UserPromptSubmit',
        'additionalContext': ctx
    }
}))
" <<< "$CTX"

Make executable:

chmod +x /home/wizard/code/.claude/hooks/agent-state-context.sh

Step 2: Smoke-test

Run:

/home/wizard/code/.claude/hooks/agent-state-context.sh | python3 -m json.tool

Expected: JSON with hookSpecificOutput.additionalContext containing the beads sections (presence section if any claims exist).

Step 3: Commit

git add .claude/hooks/agent-state-context.sh
git commit -m "presence: consolidated UserPromptSubmit hook (beads + presence)"

Task 12: Wire all new hooks in project-level `.claude/settings.json`

Files:

Modify: .claude/settings.json
Step 1: Replace the existing UserPromptSubmit + add new hooks

Edit /home/wizard/code/.claude/settings.json. The existing hook stanza for UserPromptSubmit points at ~/.claude/hooks/beads-task-context.sh. Replace it and add SessionStart/PostToolUse/Stop:

{
  "hooks": {
    "SessionStart": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "/home/wizard/code/.claude/hooks/presence-session-start.sh",
            "timeout": 3
          }
        ]
      }
    ],
    "UserPromptSubmit": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "/home/wizard/code/.claude/hooks/agent-state-context.sh",
            "timeout": 5
          }
        ]
      }
    ],
    "PostToolUse": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "/home/wizard/code/.claude/hooks/presence-heartbeat.sh",
            "timeout": 3
          }
        ]
      }
    ],
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "/home/wizard/code/.claude/hooks/presence-release.sh",
            "timeout": 3
          }
        ]
      }
    ],
    "PermissionRequest": [
      {
        "matcher": "TaskCreate|TaskUpdate|TodoWrite",
        "hooks": [
          {
            "type": "command",
            "command": "python3 /home/wizard/.claude/hooks/beads-block-builtin-tasks.py",
            "timeout": 3
          }
        ]
      }
    ]
  }
}

Step 2: Validate JSON

Run:

python3 -c "import json; json.load(open('/home/wizard/code/.claude/settings.json'))"
echo "OK"

Expected: OK (no traceback).

Step 3: Commit

git add .claude/settings.json
git commit -m "presence: wire SessionStart/PostToolUse/Stop hooks + swap UserPromptSubmit to consolidated agent-state-context"

Task 13: Add the CLAUDE.md rules

Files:

Modify: CLAUDE.md
Step 1: Append the Agent Presence section to CLAUDE.md

Add the following block to /home/wizard/code/CLAUDE.md, immediately after the "MANDATORY: Complete Tasks Fully" section and before "Repository Overview":

## MANDATORY: Agent Presence — claim before mutating shared infra

**Before any operation that mutates shared infrastructure state, claim the affected resource(s) via the `presence` CLI.** This board is read by every Claude Code session in this repo (via the UserPromptSubmit hook), so other sessions know what's in flight and can avoid colliding with you.

### When to claim (high bar — only mutations)

**Triggers a claim:**
- `terraform apply` / `terragrunt apply`
- `kubectl apply`, `kubectl drain`, `kubectl delete`, `kubectl rollout restart`
- `helm upgrade --install`, `helm uninstall`
- Service restarts (`systemctl restart`, deliberate pod deletes)
- DB schema migrations or destructive queries
- Node-level changes (kernel, drivers, network config, firmware)

**Does NOT trigger a claim:**
- Read-only ops: `kubectl get/describe/logs`, `terraform plan`, `terragrunt plan`, file reads
- In-repo edits not yet applied (editing chart values, drafting Terraform)

### How to claim

```bash
~/code/scripts/presence claim <label> --purpose "<what + why>"

Resource label vocabulary (loose convention — pick the closest fit, use infra:<freeform> as fallback):

Type	Example
`node:<name>`	`node:k8s-node1` (K8s node or VM)
`host:<name>`	`host:proxmox-1`, `host:devvm` (non-cluster machine)
`stack:<name>`	`stack:gpu-operator` (a Terraform stack)
`service:<name>`	`service:fire-planner` (a K8s service / app)
`db:<name>`	`db:pg-cluster` (a database)
`pvc:<ns>/<name>`	`pvc:prometheus/prom-data` (a PVC)
`infra:<freeform>`	catch-all

A session may hold several claims simultaneously (e.g., GPU work claims node:k8s-node1 AND stack:gpu-operator).

Conflict resolution: defer by default

If presence claim reports an existing claim by another session on the same resource:

Read their purpose and claimed_at.
Default: DEFER. Release your own claim (presence release <label>) and tell the user: "Session S1 on host H is currently doing X (since T) — should I wait or proceed?"
Only proceed after explicit user authorization.
The underlying tool's own lock (terraform state, k8s resourceVersion, git) will still prevent silent overwrites — but presence-level deference saves you from blunt-forcing through it.

Lifecycle (you don't need to think about it)

Each claim lives for 15 minutes by default.
The PostToolUse hook silently heartbeats every ~2 min while the session is active, extending TTL.
The Stop hook releases all your claims on session exit.
If a session dies abruptly, its claims expire automatically — no "left unclosed" failure mode.

Discovery

Other sessions' active claims appear in the Currently being mutated by OTHER sessions block of your context on every turn (UserPromptSubmit hook). Before risky ops you can also presence peek <label> for a fresh read.

Presence vs Beads

Beads = TODO list / tracked work (have IDs, persist for days/weeks, closed when shipped).
Presence = live mutations happening right now (no IDs, expire in minutes, auto-released).

Use both. They answer different questions.


- [ ] **Step 2: Verify CLAUDE.md renders**

Run:
```bash
head -160 /home/wizard/code/CLAUDE.md | tail -60

Expected: see the new section in place.

Step 3: Commit

git add CLAUDE.md
git commit -m "CLAUDE.md: add Agent Presence rule (claim before mutating)"

Task 14: Audit and clean up stale beads `in_progress` items

Files: none — this is a beads CLI operation.

The 8 currently in_progress items are exactly the "wrong granularity" failure. Audit each: if done, close; if not actively being worked, revert to open (or blocked if waiting on something).

Step 1: List the in_progress items with descriptions

Run:

bd --db /home/wizard/code/.beads list --status in_progress

Expected: see the 8 items.

Step 2: For each item, decide done / revert / blocked

For each code-XXXX item, run:

bd --db /home/wizard/code/.beads show <id>

Make a per-item decision:

If the underlying work is complete → bd --db /home/wizard/code/.beads close <id> with a note explaining what landed.
If it's truly active in this calendar week → leave in_progress (rare).
If it's stalled / paused → bd --db /home/wizard/code/.beads update <id> --status open and add a bd note <id> explaining the pause.
If it's blocked on external action → bd --db /home/wizard/code/.beads update <id> --status blocked and explain.

Expected: at the end, bd list --status in_progress either is empty or contains only what's actually being worked this calendar week. The 8 stale rows shown at the start of this plan should not all still be in_progress.

Step 3: Verify cleanup

Run:

bd --db /home/wizard/code/.beads list --status in_progress | wc -l

Expected: ≤ 2 items (only what's genuinely active this week).

Step 4: No commit required for this task (beads state lives in Dolt, not git)

Task 15: End-to-end smoke test (two terminal sessions)

Files: none — this is a verification step.

Step 1: From terminal A, claim a fake resource

Run:

cd /home/wizard/code
export CLAUDE_SESSION_ID="wizard@devvm@smoketestA"
./scripts/presence claim infra:smoke-test-a --purpose "verify presence v1"

Expected: presence: claimed infra:smoke-test-a.

Step 2: From terminal B, peek the same label

Run:

cd /home/wizard/code
export CLAUDE_SESSION_ID="wizard@devvm@smoketestB"
./scripts/presence peek infra:smoke-test-a

Expected: shows terminal A's claim — session_id=wizard@devvm@smoketestA, purpose verify presence v1.

Step 3: From terminal B, claim same label — should show conflict

Run:

./scripts/presence claim infra:smoke-test-a --purpose "intentional conflict"

Expected: claimed line PLUS ALSO CLAIMED BY: wizard@devvm@smoketestA ... and DEFER reminder.

Step 4: From terminal B, release

Run:

./scripts/presence release --all-mine
./scripts/presence list --all

Expected: only terminal A's claim remains.

Step 5: Confirm UserPromptSubmit hook surfaces terminal A's claim

From a new Claude Code session, type any message. Expected: the "Currently being mutated by OTHER sessions" block appears in the context (visible in the session's first response or via debug log) showing infra:smoke-test-a claimed by wizard@devvm@smoketestA.

Step 6: Clean up

Run:

export CLAUDE_SESSION_ID="wizard@devvm@smoketestA"
./scripts/presence release --all-mine
./scripts/presence list --all

Expected: no claims listed.

Step 7: Push the branch

git push origin master

Wait for CI / Drone deployment of the beads-server stack to succeed before marking the rollout complete.

Out of scope for v1 (deferred to v2+)

claude-agent-service integration — bake presence into the agent image. (Deferred per Q12 decision.)
Woodpecker CI integration — .woodpecker/*.yml snippets that claim before terraform apply.
BeadBoard UI panel — extend BeadBoard to render a "Currently in flight" section reading from presence_claims.
Hard enforcement — PreToolUse hook that blocks mutating Bash without an active claim. Only add if measurement shows compliance < 50% after a few weeks.
Compliance measurement — log terraform apply invocations and cross-reference with active claims at apply-time, to decide whether to escalate enforcement.

Self-review

Spec coverage: Each of the 14 design decisions has a corresponding implementation task:

Decisions 1-3 (purpose, unit, lifecycle) → Tasks 1, 8, 9, 10
Decision 4 (mechanism) → Tasks 2-7 (CLI) + Task 13 (CLAUDE.md rule)
Decision 5 (storage) → Task 1
Decision 6 (CLI surface) → Tasks 3-6
Decision 7 (discovery) → Tasks 11, 12
Decision 8 (vocab) → Task 13 (CLAUDE.md)
Decision 9 (trigger bar) → Task 13 (CLAUDE.md)
Decision 10 (conflict) → Task 3 (_claim) + Task 13 (rule)
Decision 11 (scope: devvm only) → implicit; out-of-scope items listed above
Decision 12 (beads coexistence) → Tasks 11, 13, 14
Decision 13 (pure rule + visibility) → Tasks 11, 12, 13
Decision 14 (Python single-file) → Tasks 2-7

Placeholder scan: No TBD / TODO / "appropriate error handling" / "similar to" patterns.

Type consistency: Schema column names (session_id, resource_label, purpose, claimed_at, expires_at, host, user, agent_name) used identically in Task 1 (DDL), Task 3 (_claim SQL), Task 4 (_peek, _list SQL), Task 5 (_heartbeat, _release SQL), Task 6 (JSON output keys), Task 11 (hook output formatting).

Risks flagged in plan body, not blockers:

Single-session-per-user limitation in SessionStart hook (Task 8). Documented; override via explicit CLAUDE_SESSION_ID env.
Dolt commit history may grow from heartbeat writes — dolt gc periodically if it bloats.
BeadBoard not updated in v1; will need a follow-up TF PR to render the new table.

46 KiB Raw Blame History

Agent Presence Implementation Plan

File Structure

Task 1: Create presence_claims table on the Dolt server

Task 2: Python CLI scaffolding (argparse + DB connection)

Task 3: claim verb — write to DB, return conflicts

Task 4: peek and list verbs (read paths)

Task 5: heartbeat and release verbs

Task 6: --json output mode (used by the hook)

Task 7: Verify graceful degradation when Dolt is unreachable

Task 8: SessionStart hook — write the session ID

Task 9: PostToolUse heartbeat hook (throttled)

Task 10: Stop hook — release all of my claims

Task 11: Consolidate beads + presence into one UserPromptSubmit hook

Task 12: Wire all new hooks in project-level .claude/settings.json

Task 13: Add the CLAUDE.md rules

Conflict resolution: defer by default

Lifecycle (you don't need to think about it)

Discovery

Presence vs Beads

Task 14: Audit and clean up stale beads in_progress items

Task 15: End-to-end smoke test (two terminal sessions)

Out of scope for v1 (deferred to v2+)

Self-review

46 KiB

Raw Blame History

Task 1: Create `presence_claims` table on the Dolt server

Task 3: `claim` verb — write to DB, return conflicts

Task 4: `peek` and `list` verbs (read paths)

Task 5: `heartbeat` and `release` verbs

Task 6: `--json` output mode (used by the hook)

Task 12: Wire all new hooks in project-level `.claude/settings.json`

Task 14: Audit and clean up stale beads `in_progress` items