openclaw: SSH + tmux task fallback to devvm
Give the OpenClaw pod two new capabilities:
1. Host-tools bundle. New init container `install-host-tools` extracts
openssh-client + dnsutils + tmux + jq + ripgrep + fd + vault + yq +
friends into /tools/host-tools/, with the bookworm-slim libs the
binaries need. PATH + LD_LIBRARY_PATH on the main container point
ld.so at the bundle. Idempotent via /tools/host-tools/.installed-v1
marker; smoke test (ldd-based) fails the init at deploy time if any
binary has unresolved deps. Bundle is ~558 MB on the existing
/srv/nfs/openclaw/tools NFS.
2. devvm SSH + async task pattern. New init `setup-ssh-config` writes
id_rsa/config/known_hosts under /home/node/.openclaw/.ssh; main
container startup symlinks /home/node/.ssh → there. New
/usr/local/bin/openclaw-task wrapper on devvm manages long-running
work as tmux sessions on devvm (sessions and logs survive pod
restarts — they live on devvm, not in the pod). New init container
`seed-devvm-memory-note` drops a markdown note teaching the pattern;
main container startup now runs `openclaw memory index --force` so
the note is searchable on first boot.
Design + verified E2E flow in
docs/plans/2026-05-22-openclaw-devvm-access-design.md. Persistence test
green: spawned a 50s task from pod A, deleted pod A, new pod B saw the
task finish and read its full log.
Pre-existing keel.sh annotation drift on openclaw/{openlobster,
task_webhook} cleaned up in the same apply.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
d9ad973621
commit
7e558de8f0
3 changed files with 694 additions and 2 deletions
269
docs/plans/2026-05-22-openclaw-devvm-access-design.md
Normal file
269
docs/plans/2026-05-22-openclaw-devvm-access-design.md
Normal file
|
|
@ -0,0 +1,269 @@
|
|||
# OpenClaw devvm access + async task pattern — design
|
||||
|
||||
**Date:** 2026-05-22
|
||||
**Stack:** `infra/stacks/openclaw`
|
||||
**Status:** Approved (in-session, see chat history 2026-05-22)
|
||||
|
||||
## Goal
|
||||
|
||||
Give the OpenClaw pod (running in K8s) two new capabilities:
|
||||
|
||||
1. **Host-tools bundle** — common Linux CLIs the upstream OpenClaw image
|
||||
doesn't ship (`ssh`, `scp`, `vault`, `dig`, `jq`, `yq`, `ripgrep`, `fd`,
|
||||
`gnupg`, `tmux`, etc.). OpenClaw can't `apt install` because the
|
||||
container runs as non-root `node` (uid 1000).
|
||||
2. **devvm async task pattern** — OpenClaw spawns long-running work as
|
||||
`tmux` sessions on devvm, sends prompts via `tmux send-keys`, captures
|
||||
progress via `tmux capture-pane`. Sessions live on devvm, so they
|
||||
survive OpenClaw pod restarts.
|
||||
|
||||
OpenClaw uses this combination as a **trusted fallback** for tasks too
|
||||
expensive, sensitive, or stateful for in-pod execution: Vault lookups,
|
||||
multi-step `claude-code` work, anything needing wizard's full home-lab
|
||||
access.
|
||||
|
||||
## Why now
|
||||
|
||||
- The in-pod sandbox is `security=full` but the container is minimal —
|
||||
no `ssh`, no `vault`, no `dig`, no `tmux`.
|
||||
- The user wants OpenClaw to be a first-line agent that delegates heavy
|
||||
work to the dev VM rather than duplicate that work in a constrained pod.
|
||||
- Long-running work (multi-minute `claude-code` sessions) shouldn't be
|
||||
tied to a single synchronous `claude -p` invocation — needs persistence
|
||||
and pollability.
|
||||
|
||||
## Architecture decision: stay on K8s
|
||||
|
||||
Discussed migrating OpenClaw to run directly on devvm (would obviate the
|
||||
host-tools bundle + most of the SSH setup). Decision: **stay on K8s**.
|
||||
|
||||
Reasons:
|
||||
- Keeps HA (5-node cluster vs single devvm reboot)
|
||||
- Keeps ingress/Authentik/Telegram entry chain intact
|
||||
- Keeps Prometheus scrape + exporter sidecar
|
||||
- Keeps PVC backup pipeline (LVM snapshots + Synology offsite)
|
||||
- Resource isolation — a runaway LLM session can't stress wizard's daily-driver VM
|
||||
- Migration cost is several days; this design is ~150 LoC + an 80-line wrapper
|
||||
|
||||
The mental model — "OpenClaw is sandboxed, delegates to wizard@devvm for
|
||||
trusted heavy lifting" — is a clean security boundary. Worth preserving.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Pod side (`infra/stacks/openclaw/main.tf`)
|
||||
|
||||
Two new init containers added to the OpenClaw Deployment, after the
|
||||
existing four:
|
||||
|
||||
#### Init 5 — `install-host-tools`
|
||||
|
||||
- Image: `debian:bookworm-slim` (matches main container base for glibc compat)
|
||||
- Idempotent: skips if `/tools/host-tools/.installed-v1` exists
|
||||
- `apt-get install --download-only --no-install-recommends` for:
|
||||
`openssh-client dnsutils iputils-ping wget gnupg jq ripgrep fd-find ncdu htop strace tcpdump tmux unzip`
|
||||
- Iterates `.deb` files in `/var/cache/apt/archives/`, `dpkg-deb -x` each
|
||||
into `/tools/host-tools/root/` (preserves `usr/bin`, `usr/sbin`,
|
||||
`usr/lib` layout)
|
||||
- Downloads static binaries to `/tools/host-tools/bin/`:
|
||||
- `vault` (HashiCorp releases, pinned version)
|
||||
- `yq` (mikefarah/yq GitHub releases, pinned version)
|
||||
- Smoke test: invokes `--version` on each bundled binary; fails init if
|
||||
any won't load (catches glibc / shared-lib drift at deploy time, not
|
||||
runtime)
|
||||
- Writes marker file with version
|
||||
|
||||
#### Init 6 — `setup-ssh-config`
|
||||
|
||||
- Image: uses the just-installed host-tools (debian:bookworm-slim base
|
||||
with `/tools/host-tools/root/usr/bin` on PATH so `ssh-keyscan` works)
|
||||
- Runs after `install-host-tools`
|
||||
- Idempotent: skips if `/home/node/.openclaw/.ssh/.configured-v1` exists
|
||||
- Creates `/home/node/.openclaw/.ssh/` (uid 1000)
|
||||
- Copies `/ssh/id_rsa` (tmpfs secret mount) → `~/.ssh/id_rsa` with 0600
|
||||
(the secret tmpfs mount has wider perms that openssh rejects)
|
||||
- Writes `~/.ssh/config`:
|
||||
|
||||
```ssh-config
|
||||
Host devvm
|
||||
HostName 10.0.10.10
|
||||
User wizard
|
||||
IdentityFile ~/.ssh/id_rsa
|
||||
UserKnownHostsFile ~/.ssh/known_hosts
|
||||
StrictHostKeyChecking yes
|
||||
```
|
||||
|
||||
PATH handling on the remote side: devvm's sshd uses the default
|
||||
non-interactive PATH (`/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin`)
|
||||
and does NOT load `~/.profile` or `~/.bashrc` (memory id=740). Client-side
|
||||
`SetEnv PATH=…` doesn't help because sshd's `AcceptEnv` is `LANG LC_*` only.
|
||||
Solution: install the binaries openclaw cares about into `/usr/local/bin/`
|
||||
on devvm (see "Devvm side" below).
|
||||
|
||||
- Pre-seeds `~/.ssh/known_hosts` via `ssh-keyscan -H 10.0.10.10`
|
||||
- Writes marker file
|
||||
|
||||
#### Main container
|
||||
|
||||
- `PATH` env updated: prepend
|
||||
`/tools/host-tools/root/usr/bin:/tools/host-tools/root/usr/sbin:/tools/host-tools/bin`
|
||||
- No other changes to the startup command
|
||||
|
||||
### Devvm side
|
||||
|
||||
#### `/usr/local/bin/openclaw-task` wrapper
|
||||
|
||||
Canonical source: `infra/stacks/openclaw/files/openclaw-task.sh`.
|
||||
Installed to devvm at `/usr/local/bin/openclaw-task` (`sudo cp`, `sudo
|
||||
chmod +x`) so non-interactive SSH finds it on the default PATH without
|
||||
needing `~/.profile`. Updates: re-run the install steps from the
|
||||
canonical source.
|
||||
|
||||
Also: `sudo ln -s /home/wizard/.local/bin/claude /usr/local/bin/claude`
|
||||
so `ssh devvm claude …` works in non-interactive mode. `vault` and `tmux`
|
||||
are already at `/usr/bin/` (system packages) so no symlink needed for
|
||||
those.
|
||||
|
||||
POSIX shell script. Subcommands:
|
||||
|
||||
| Subcommand | Behavior |
|
||||
|---|---|
|
||||
| `new <id> <cmd...>` | Spawns detached tmux session `openclaw-task-<id>`, pipes pane output to `~/openclaw-tasks/<id>.log` |
|
||||
| `claude <id> <prompt>` | Convenience: spawns interactive `claude` in a tmux session, send-keys the prompt + Enter |
|
||||
| `send <id> <keys...>` | `tmux send-keys -t openclaw-task-<id> "$@"` — caller supplies `Enter` literal if needed |
|
||||
| `capture <id> [lines]` | `tmux capture-pane -t … -p -S -<lines>` (default last 1000) |
|
||||
| `log <id>` | `cat ~/openclaw-tasks/<id>.log` |
|
||||
| `tail <id>` | `tail -n 100 -f ~/openclaw-tasks/<id>.log` (mainly for human ops) |
|
||||
| `list` | tmux session list filtered to `openclaw-task-*`, one id per line |
|
||||
| `status <id>` | `running` if tmux session alive, `ended` otherwise |
|
||||
| `kill <id>` | `tmux kill-session -t openclaw-task-<id>` (log file is kept) |
|
||||
| `purge <id>` | `kill` + `rm -f ~/openclaw-tasks/<id>.log` |
|
||||
|
||||
Task state lives entirely on devvm:
|
||||
|
||||
- tmux sessions persist across SSH disconnects and OpenClaw pod restarts
|
||||
- `~/openclaw-tasks/<id>.log` is the durable transcript even after a
|
||||
session is killed
|
||||
- No central database — `tmux list-sessions` is the source of truth for
|
||||
"what's running"
|
||||
|
||||
Naming convention: tmux sessions are prefixed `openclaw-task-` so they
|
||||
don't collide with wizard's own tmux work (`0`, `Openclaw`, `read-only`).
|
||||
|
||||
### Memory note
|
||||
|
||||
File at `/workspace/memory/projects/openclaw-runtime/devvm-fallback.md`
|
||||
teaching OpenClaw the pattern. Indexed by the existing daily
|
||||
`memory-sync` CronJob (or via manual `node openclaw.mjs memory index
|
||||
--force` for the initial seed).
|
||||
|
||||
Content (verbatim):
|
||||
|
||||
```markdown
|
||||
# Using devvm as a fallback
|
||||
|
||||
When in-pod tools/permissions block you, SSH to devvm and use it. The
|
||||
devvm runs as wizard with full home-lab access (Vault, kubectl, git
|
||||
repos, Cloudflare, etc.) and has Claude Code v2+ installed.
|
||||
|
||||
## One-shot lookup
|
||||
ssh devvm 'vault kv get -field=brave_api_key secret/openclaw'
|
||||
ssh devvm 'claude -p "investigate why frigate is restarting"'
|
||||
|
||||
## Long-running async work — USE THIS for anything > ~2 min
|
||||
Spawn in a tmux session on devvm. Sessions survive OpenClaw pod restarts.
|
||||
|
||||
# spawn
|
||||
ssh devvm openclaw-task new my-task "claude -p --dangerously-skip-permissions 'do the thing'"
|
||||
|
||||
# poll progress (last 1000 lines of pane)
|
||||
ssh devvm openclaw-task capture my-task
|
||||
|
||||
# interactive claude (send follow-up prompts)
|
||||
ssh devvm openclaw-task claude my-task "initial prompt"
|
||||
ssh devvm openclaw-task send my-task "follow-up prompt" Enter
|
||||
|
||||
# housekeeping
|
||||
ssh devvm openclaw-task list
|
||||
ssh devvm openclaw-task status my-task
|
||||
ssh devvm openclaw-task kill my-task
|
||||
|
||||
Logs persist at ~/openclaw-tasks/<id>.log on devvm even after a session
|
||||
is killed. Use `ssh devvm openclaw-task log <id>` to retrieve them.
|
||||
```
|
||||
|
||||
## Devvm: no infra changes
|
||||
|
||||
Pre-existing state verified 2026-05-22:
|
||||
|
||||
- pubkey from `/ssh/id_rsa` (Vault `secret/openclaw → ssh_key`) matches the
|
||||
`ssh-ed25519 AAAA…lug node@openclaw-58cd9f7987-884bv` line in
|
||||
`~/.ssh/authorized_keys` (the comment is a stale pod name; the key
|
||||
itself is stable from Vault)
|
||||
- sshd listens on 0.0.0.0:22 ✓
|
||||
- `claude` v2.1.126 at `/home/wizard/.local/bin/claude` ✓
|
||||
- `tmux` 3.4 installed, server already running with existing user sessions ✓
|
||||
|
||||
Only changes (one-time, done in the same session via `sudo`):
|
||||
- Install `openclaw-task` wrapper to `/usr/local/bin/openclaw-task`
|
||||
- Symlink `/home/wizard/.local/bin/claude` → `/usr/local/bin/claude`
|
||||
|
||||
## Tradeoffs / risks
|
||||
|
||||
- **Bundle size on NFS**: ~30MB extracted. Acceptable on
|
||||
`/srv/nfs/openclaw/tools`.
|
||||
- **Library version drift**: bundled binaries link against bookworm libs.
|
||||
Smoke test in `install-host-tools` catches breakage on the next pod
|
||||
restart if upstream OpenClaw image rebases.
|
||||
- **Full-shell SSH**: explicit user choice. Blast radius if openclaw is
|
||||
prompt-injected = full wizard access. Mitigation: keep OpenClaw's
|
||||
plugin allowlist tight (current allow list: `memory-core, recruiter-api,
|
||||
telegram, openrouter, brave, openai, codex`).
|
||||
- **tmux server lifecycle on devvm**: if wizard's tmux server dies (rare —
|
||||
usually only on devvm reboot), in-flight openclaw tasks are killed.
|
||||
Acceptable for home lab. Task logs persist regardless.
|
||||
- **Task log unbounded growth**: `~/openclaw-tasks/*.log` grows forever.
|
||||
Out of scope here. User can add a `find -mtime +N -delete` cron later.
|
||||
- **Init container order**: `setup-ssh-config` depends on
|
||||
`install-host-tools` finishing first. K8s init containers run
|
||||
sequentially in declaration order — natural ordering, no explicit
|
||||
dependency mechanism needed.
|
||||
|
||||
## Testing — E2E flows required by user
|
||||
|
||||
1. **Tools present**:
|
||||
`kubectl -n openclaw exec <pod> -c openclaw -- ssh -V` returns version,
|
||||
same for `dig`, `vault`, `jq`, `yq`, `tmux`, `rg`.
|
||||
2. **SSH happy path**:
|
||||
`kubectl -n openclaw exec <pod> -c openclaw -- ssh devvm 'hostname'`
|
||||
returns `devvm`.
|
||||
3. **Claude one-shot**:
|
||||
`kubectl -n openclaw exec <pod> -c openclaw -- ssh devvm 'claude -p "what is 1+1"'`
|
||||
returns `2`.
|
||||
4. **Async task lifecycle**:
|
||||
- `ssh devvm openclaw-task new test-1 "sleep 30; echo done"`
|
||||
- `ssh devvm openclaw-task list` contains `test-1`
|
||||
- `ssh devvm openclaw-task status test-1` returns `running`
|
||||
- wait 35s
|
||||
- `ssh devvm openclaw-task log test-1` contains `done`
|
||||
- `ssh devvm openclaw-task status test-1` returns `ended`
|
||||
5. **Persistence test** (the key requirement):
|
||||
- Spawn long task: `ssh devvm openclaw-task new persist-1 "sleep 120; echo survived > /tmp/persist-1.proof"`
|
||||
- `kubectl -n openclaw delete pod <openclaw-pod>` — pod recreated
|
||||
- Wait for new pod ready (init containers run, skip via marker, fast)
|
||||
- `kubectl -n openclaw exec <new-pod> -c openclaw -- ssh devvm openclaw-task list`
|
||||
contains `persist-1`
|
||||
- Wait for original sleep to finish; verify `/tmp/persist-1.proof`
|
||||
contains `survived` from new pod
|
||||
6. **Memory note lookup**:
|
||||
`kubectl -n openclaw exec <pod> -c openclaw -- node openclaw.mjs memory search 'devvm fallback'`
|
||||
returns the note.
|
||||
|
||||
## Docs to update with the change
|
||||
|
||||
- `infra/docs/plans/2026-05-22-openclaw-devvm-access-design.md` (this doc)
|
||||
- `infra/docs/plans/2026-05-22-openclaw-devvm-access-plan.md` (implementation plan)
|
||||
- `infra/.claude/reference/service-catalog.md` (one-line addition under
|
||||
OpenClaw: "Has SSH to devvm with host-tools bundle; long-running async
|
||||
tasks via `openclaw-task` wrapper on devvm")
|
||||
- `infra/.claude/CLAUDE.md` "Known Issues" section is left alone — none of
|
||||
the existing OpenClaw caveats change.
|
||||
184
stacks/openclaw/files/openclaw-task.sh
Normal file
184
stacks/openclaw/files/openclaw-task.sh
Normal file
|
|
@ -0,0 +1,184 @@
|
|||
#!/usr/bin/env bash
|
||||
# openclaw-task — manage long-running tmux tasks on devvm
|
||||
#
|
||||
# Canonical source: infra/stacks/openclaw/files/openclaw-task.sh
|
||||
# Installed to /usr/local/bin/openclaw-task on devvm so non-interactive
|
||||
# SSH (e.g. `ssh devvm openclaw-task list`) finds it on the default PATH.
|
||||
#
|
||||
# Sessions are prefixed `openclaw-task-` to avoid colliding with the
|
||||
# user's own tmux work. Persistent transcripts live in
|
||||
# ~/openclaw-tasks/<id>.log via `tmux pipe-pane`. Sessions and logs
|
||||
# survive OpenClaw pod restarts (they live on devvm, not in the pod).
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# Use full paths because non-interactive SSH does not source ~/.profile
|
||||
# or ~/.bashrc (see memory id=740).
|
||||
TMUX_BIN=/usr/bin/tmux
|
||||
CLAUDE_BIN=/usr/local/bin/claude # installed as symlink to /home/wizard/.local/bin/claude
|
||||
|
||||
PREFIX=openclaw-task-
|
||||
TASK_DIR=${OPENCLAW_TASK_DIR:-$HOME/openclaw-tasks}
|
||||
mkdir -p "$TASK_DIR"
|
||||
|
||||
die() { echo "openclaw-task: $*" >&2; exit 1; }
|
||||
|
||||
session_name() { printf 'openclaw-task-%s' "$1"; }
|
||||
|
||||
require_session() {
|
||||
local name="$1"
|
||||
"$TMUX_BIN" has-session -t "$name" 2>/dev/null || die "no session '$name' (use 'openclaw-task list')"
|
||||
}
|
||||
|
||||
usage() {
|
||||
cat <<EOF
|
||||
openclaw-task — manage long-running tmux tasks on devvm
|
||||
|
||||
USAGE
|
||||
openclaw-task new <id> <command...> spawn detached tmux session
|
||||
openclaw-task claude <id> [prompt...] spawn interactive claude in a session;
|
||||
if prompt given, send-keys it + Enter
|
||||
openclaw-task send <id> <keys...> tmux send-keys passthrough (you must
|
||||
pass 'Enter' literal for newline)
|
||||
openclaw-task capture <id> [lines] last <lines> of pane (default 1000)
|
||||
openclaw-task log <id> cat the persistent pipe-pane log
|
||||
openclaw-task tail <id> tail -f the persistent log
|
||||
openclaw-task list all openclaw task ids (one per line)
|
||||
openclaw-task status <id> 'running' or 'ended'
|
||||
openclaw-task kill <id> kill session (log file kept)
|
||||
openclaw-task purge <id> kill + delete log file
|
||||
|
||||
EXAMPLES
|
||||
openclaw-task new build-foo "cd ~/code/foo && make all 2>&1"
|
||||
openclaw-task claude diag-frigate
|
||||
openclaw-task send diag-frigate "investigate gpu crashloop" Enter
|
||||
openclaw-task capture diag-frigate 200
|
||||
openclaw-task list
|
||||
EOF
|
||||
}
|
||||
|
||||
cmd_new() {
|
||||
[ $# -lt 2 ] && die "usage: openclaw-task new <id> <command...>"
|
||||
local id="$1"; shift
|
||||
local name; name=$(session_name "$id")
|
||||
if "$TMUX_BIN" has-session -t "$name" 2>/dev/null; then
|
||||
die "session '$name' already exists"
|
||||
fi
|
||||
local log="$TASK_DIR/$id.log"
|
||||
: > "$log"
|
||||
# Start an idle interactive bash so pipe-pane can attach BEFORE the
|
||||
# user's command runs. If we passed the command directly to
|
||||
# new-session, its first lines beat pipe-pane to the pane and never
|
||||
# land in the log.
|
||||
"$TMUX_BIN" new-session -d -s "$name" bash --norc -i
|
||||
"$TMUX_BIN" pipe-pane -o -t "$name" "cat >> '$log'"
|
||||
sleep 0.2
|
||||
"$TMUX_BIN" send-keys -t "$name" "$*" Enter
|
||||
# Auto-exit propagating the command's status so the tmux session
|
||||
# ends when the command does.
|
||||
"$TMUX_BIN" send-keys -t "$name" 'exit $?' Enter
|
||||
printf 'session: %s\nlog: %s\n' "$name" "$log"
|
||||
}
|
||||
|
||||
cmd_claude() {
|
||||
[ $# -lt 1 ] && die "usage: openclaw-task claude <id> [prompt...]"
|
||||
local id="$1"; shift
|
||||
local name; name=$(session_name "$id")
|
||||
if "$TMUX_BIN" has-session -t "$name" 2>/dev/null; then
|
||||
die "session '$name' already exists (use 'send' to add prompts)"
|
||||
fi
|
||||
local log="$TASK_DIR/$id.log"
|
||||
: > "$log"
|
||||
# sleep+exec lets pipe-pane attach before claude prints its banner.
|
||||
"$TMUX_BIN" new-session -d -s "$name" bash -c "sleep 0.3; exec '$CLAUDE_BIN'"
|
||||
"$TMUX_BIN" pipe-pane -o -t "$name" "cat >> '$log'"
|
||||
if [ $# -gt 0 ]; then
|
||||
# Wait for claude to come up before sending the prompt
|
||||
sleep 2
|
||||
"$TMUX_BIN" send-keys -t "$name" "$*" Enter
|
||||
fi
|
||||
printf 'session: %s\nlog: %s\n' "$name" "$log"
|
||||
}
|
||||
|
||||
cmd_send() {
|
||||
[ $# -lt 2 ] && die "usage: openclaw-task send <id> <keys...>"
|
||||
local id="$1"; shift
|
||||
local name; name=$(session_name "$id")
|
||||
require_session "$name"
|
||||
"$TMUX_BIN" send-keys -t "$name" "$@"
|
||||
}
|
||||
|
||||
cmd_capture() {
|
||||
[ $# -lt 1 ] && die "usage: openclaw-task capture <id> [lines]"
|
||||
local id="$1"
|
||||
local lines="${2:-1000}"
|
||||
local name; name=$(session_name "$id")
|
||||
require_session "$name"
|
||||
"$TMUX_BIN" capture-pane -t "$name" -p -S "-$lines"
|
||||
}
|
||||
|
||||
cmd_log() {
|
||||
[ $# -lt 1 ] && die "usage: openclaw-task log <id>"
|
||||
local id="$1"
|
||||
local log="$TASK_DIR/$id.log"
|
||||
[ -f "$log" ] || die "no log file for '$id' (looked at $log)"
|
||||
cat "$log"
|
||||
}
|
||||
|
||||
cmd_tail() {
|
||||
[ $# -lt 1 ] && die "usage: openclaw-task tail <id>"
|
||||
local id="$1"
|
||||
local log="$TASK_DIR/$id.log"
|
||||
[ -f "$log" ] || die "no log file for '$id' (looked at $log)"
|
||||
tail -n 100 -f "$log"
|
||||
}
|
||||
|
||||
cmd_list() {
|
||||
"$TMUX_BIN" list-sessions -F '#{session_name}' 2>/dev/null \
|
||||
| grep "^$PREFIX" \
|
||||
| sed "s|^$PREFIX||" \
|
||||
|| true
|
||||
}
|
||||
|
||||
cmd_status() {
|
||||
[ $# -lt 1 ] && die "usage: openclaw-task status <id>"
|
||||
local id="$1"
|
||||
local name; name=$(session_name "$id")
|
||||
if "$TMUX_BIN" has-session -t "$name" 2>/dev/null; then
|
||||
echo running
|
||||
else
|
||||
echo ended
|
||||
fi
|
||||
}
|
||||
|
||||
cmd_kill() {
|
||||
[ $# -lt 1 ] && die "usage: openclaw-task kill <id>"
|
||||
local id="$1"
|
||||
local name; name=$(session_name "$id")
|
||||
require_session "$name"
|
||||
"$TMUX_BIN" kill-session -t "$name"
|
||||
}
|
||||
|
||||
cmd_purge() {
|
||||
[ $# -lt 1 ] && die "usage: openclaw-task purge <id>"
|
||||
local id="$1"
|
||||
local name; name=$(session_name "$id")
|
||||
"$TMUX_BIN" kill-session -t "$name" 2>/dev/null || true
|
||||
rm -f "$TASK_DIR/$id.log"
|
||||
echo "purged: $id"
|
||||
}
|
||||
|
||||
case "${1:-help}" in
|
||||
new) shift; cmd_new "$@" ;;
|
||||
claude) shift; cmd_claude "$@" ;;
|
||||
send) shift; cmd_send "$@" ;;
|
||||
capture) shift; cmd_capture "$@" ;;
|
||||
log) shift; cmd_log "$@" ;;
|
||||
tail) shift; cmd_tail "$@" ;;
|
||||
list) shift; cmd_list "$@" ;;
|
||||
status) shift; cmd_status "$@" ;;
|
||||
kill) shift; cmd_kill "$@" ;;
|
||||
purge) shift; cmd_purge "$@" ;;
|
||||
help|-h|--help) usage ;;
|
||||
*) usage; exit 2 ;;
|
||||
esac
|
||||
|
|
@ -496,6 +496,223 @@ resource "kubernetes_deployment" "openclaw" {
|
|||
}
|
||||
}
|
||||
|
||||
# Init 4: install host-tools bundle (ssh, vault, jq, ripgrep, tmux, …)
|
||||
# into /tools/host-tools/ so the in-pod agent reaches CLI parity
|
||||
# with the dev VM. Upstream OpenClaw image is minimal Debian
|
||||
# bookworm running as uid 1000 — can't apt-install at runtime.
|
||||
# Idempotent via marker file; bump suffix to force reinstall.
|
||||
# See docs/plans/2026-05-22-openclaw-devvm-access-design.md.
|
||||
init_container {
|
||||
name = "install-host-tools"
|
||||
image = "debian:bookworm-slim"
|
||||
command = ["bash", "-c", <<-EOT
|
||||
set -euo pipefail
|
||||
DEST=/tools/host-tools
|
||||
MARKER="$DEST/.installed-v1"
|
||||
if [ -f "$MARKER" ]; then
|
||||
echo "host-tools v1 already installed (skipping)"
|
||||
exit 0
|
||||
fi
|
||||
echo "installing host-tools v1 ..."
|
||||
rm -rf "$DEST"
|
||||
mkdir -p "$DEST/root" "$DEST/bin"
|
||||
|
||||
export DEBIAN_FRONTEND=noninteractive
|
||||
apt-get update -qq
|
||||
# debian:bookworm-slim doesn't ship wget/unzip; install
|
||||
# transiently into this init container's filesystem so we
|
||||
# can download the static binaries below.
|
||||
apt-get install -y --no-install-recommends wget unzip ca-certificates
|
||||
|
||||
# NOTE: we deliberately do NOT pass --no-install-recommends to
|
||||
# the download step. ssh links against libgssapi-krb5-2 which
|
||||
# is a hard Depends but its transitive deps (libkrb5-3 etc.)
|
||||
# need to come along too. The bundle is a self-contained
|
||||
# /usr-like tree that the openclaw container can use via
|
||||
# LD_LIBRARY_PATH, so missing deps = broken binaries.
|
||||
APT_PKGS="openssh-client dnsutils iputils-ping wget gnupg jq ripgrep fd-find ncdu htop strace tcpdump tmux unzip ca-certificates"
|
||||
apt-get install -y --download-only $APT_PKGS
|
||||
|
||||
for d in /var/cache/apt/archives/*.deb; do
|
||||
dpkg-deb -x "$d" "$DEST/root/"
|
||||
done
|
||||
|
||||
VAULT_VER=1.18.3
|
||||
YQ_VER=v4.44.3
|
||||
wget -qO /tmp/vault.zip \
|
||||
"https://releases.hashicorp.com/vault/$${VAULT_VER}/vault_$${VAULT_VER}_linux_amd64.zip"
|
||||
unzip -o /tmp/vault.zip vault -d "$DEST/bin/"
|
||||
chmod +x "$DEST/bin/vault"
|
||||
wget -qO "$DEST/bin/yq" \
|
||||
"https://github.com/mikefarah/yq/releases/download/$${YQ_VER}/yq_linux_amd64"
|
||||
chmod +x "$DEST/bin/yq"
|
||||
|
||||
# Smoke test — fail init if any bundled binary has unresolved
|
||||
# shared-lib deps, so glibc / shared-lib drift surfaces at
|
||||
# deploy time. We don't run --version because flag support
|
||||
# varies (older scp returns non-zero, ping/nslookup use weird
|
||||
# conventions). ldd is the reliable signal: if any "not
|
||||
# found" appears, the binary won't load when called.
|
||||
# LD_LIBRARY_PATH points ld.so at the bundled libs (the
|
||||
# openclaw main container sets the same env).
|
||||
export PATH="$DEST/root/usr/bin:$DEST/root/usr/sbin:$DEST/root/bin:$DEST/root/sbin:$DEST/bin:$PATH"
|
||||
export LD_LIBRARY_PATH="$DEST/root/usr/lib/x86_64-linux-gnu:$DEST/root/lib/x86_64-linux-gnu"
|
||||
for t in ssh scp ssh-keyscan dig host nslookup ping wget gpg jq rg fdfind tmux vault yq; do
|
||||
bin=$(command -v "$t" 2>/dev/null) || { echo "FAIL: $t not on PATH"; exit 1; }
|
||||
if ldd "$bin" 2>&1 | grep -q "not found"; then
|
||||
echo "FAIL: $t has unresolved shared libs:"
|
||||
ldd "$bin"
|
||||
exit 1
|
||||
fi
|
||||
echo "OK: $t"
|
||||
done
|
||||
|
||||
chown -R 1000:1000 "$DEST"
|
||||
touch "$MARKER"
|
||||
echo "host-tools v1 install complete ($(du -sh "$DEST" | cut -f1))"
|
||||
EOT
|
||||
]
|
||||
volume_mount {
|
||||
name = "tools"
|
||||
mount_path = "/tools"
|
||||
}
|
||||
resources {
|
||||
requests = { cpu = "100m", memory = "256Mi" }
|
||||
limits = { memory = "512Mi" }
|
||||
}
|
||||
}
|
||||
|
||||
# Init 5: write /home/node/.openclaw/.ssh/{id_rsa,config,known_hosts}
|
||||
# so the agent can `ssh devvm` without device-trust prompts. The
|
||||
# main container symlinks /home/node/.ssh → here at startup so
|
||||
# the ssh client picks it up via $HOME/.ssh. Installs
|
||||
# openssh-client transiently into this init container so
|
||||
# ssh-keyscan works without LD_LIBRARY_PATH gymnastics.
|
||||
init_container {
|
||||
name = "setup-ssh-config"
|
||||
image = "debian:bookworm-slim"
|
||||
command = ["bash", "-c", <<-EOT
|
||||
set -euo pipefail
|
||||
SSH=/home/node/.openclaw/.ssh
|
||||
MARKER="$SSH/.configured-v1"
|
||||
if [ -f "$MARKER" ]; then
|
||||
echo "ssh-config v1 already set up (skipping)"
|
||||
exit 0
|
||||
fi
|
||||
echo "installing openssh-client for ssh-keyscan ..."
|
||||
export DEBIAN_FRONTEND=noninteractive
|
||||
apt-get update -qq
|
||||
apt-get install -y --no-install-recommends openssh-client >/dev/null
|
||||
|
||||
echo "configuring ssh ..."
|
||||
mkdir -p "$SSH"
|
||||
|
||||
# Copy the secret-mounted private key into ~/.ssh with 0600 —
|
||||
# the secret's tmpfs mount has wider perms (1777 + symlinks)
|
||||
# that openssh refuses.
|
||||
cp /ssh/id_rsa "$SSH/id_rsa"
|
||||
chmod 0600 "$SSH/id_rsa"
|
||||
|
||||
cat > "$SSH/config" <<'SSH_EOF'
|
||||
Host devvm
|
||||
HostName 10.0.10.10
|
||||
User wizard
|
||||
IdentityFile ~/.ssh/id_rsa
|
||||
UserKnownHostsFile ~/.ssh/known_hosts
|
||||
StrictHostKeyChecking yes
|
||||
SSH_EOF
|
||||
chmod 0600 "$SSH/config"
|
||||
|
||||
ssh-keyscan -H 10.0.10.10 > "$SSH/known_hosts" 2>/tmp/keyscan.err
|
||||
if [ ! -s "$SSH/known_hosts" ]; then
|
||||
echo "ssh-keyscan produced empty known_hosts; stderr:"
|
||||
cat /tmp/keyscan.err
|
||||
exit 1
|
||||
fi
|
||||
chmod 0644 "$SSH/known_hosts"
|
||||
|
||||
chown -R 1000:1000 "$SSH"
|
||||
touch "$MARKER"
|
||||
echo "ssh-config v1 set up"
|
||||
EOT
|
||||
]
|
||||
volume_mount {
|
||||
name = "openclaw-home"
|
||||
mount_path = "/home/node/.openclaw"
|
||||
}
|
||||
volume_mount {
|
||||
name = "ssh-key"
|
||||
mount_path = "/ssh"
|
||||
}
|
||||
resources {
|
||||
requests = { cpu = "50m", memory = "64Mi" }
|
||||
limits = { memory = "256Mi" }
|
||||
}
|
||||
}
|
||||
|
||||
# Init 6: seed the devvm-fallback memory note into
|
||||
# /workspace/memory/projects/openclaw-runtime/. The note teaches
|
||||
# openclaw the SSH+tmux pattern. The main container's startup
|
||||
# runs `memory index --force` so it's searchable immediately;
|
||||
# the daily memory-sync CronJob also keeps it indexed afterward.
|
||||
# Always rewrites — the configmap-baked note is canonical.
|
||||
init_container {
|
||||
name = "seed-devvm-memory-note"
|
||||
image = "busybox:1.37"
|
||||
command = ["sh", "-c", <<-EOT
|
||||
set -eu
|
||||
DIR=/workspace/memory/projects/openclaw-runtime
|
||||
mkdir -p "$DIR"
|
||||
cat > "$DIR/devvm-fallback.md" <<'NOTE_EOF'
|
||||
# Using devvm as a fallback
|
||||
|
||||
When in-pod tools/permissions block you, SSH to devvm and use it.
|
||||
Devvm runs as wizard with full home-lab access (Vault, kubectl,
|
||||
git repos, Cloudflare, etc.) and Claude Code v2+ is installed.
|
||||
|
||||
## One-shot lookup
|
||||
|
||||
ssh devvm 'vault kv get -field=brave_api_key secret/openclaw'
|
||||
ssh devvm 'claude -p "investigate why frigate is restarting"'
|
||||
|
||||
## Long-running async work — USE THIS for anything > ~2 min
|
||||
|
||||
Spawn in a tmux session on devvm. Sessions and logs survive
|
||||
OpenClaw pod restarts (they live on devvm, not in this pod).
|
||||
|
||||
# spawn
|
||||
ssh devvm openclaw-task new my-task "claude -p --dangerously-skip-permissions 'do the thing'"
|
||||
|
||||
# poll progress (last 1000 lines of pane)
|
||||
ssh devvm openclaw-task capture my-task
|
||||
|
||||
# interactive claude (send follow-up prompts)
|
||||
ssh devvm openclaw-task claude my-task "initial prompt"
|
||||
ssh devvm openclaw-task send my-task "follow-up prompt" Enter
|
||||
|
||||
# housekeeping
|
||||
ssh devvm openclaw-task list
|
||||
ssh devvm openclaw-task status my-task
|
||||
ssh devvm openclaw-task kill my-task
|
||||
|
||||
Logs persist at ~/openclaw-tasks/<id>.log on devvm even after a
|
||||
session is killed. Use `ssh devvm openclaw-task log <id>` to
|
||||
retrieve them.
|
||||
NOTE_EOF
|
||||
chown -R 1000:1000 "$DIR"
|
||||
echo "memory note seeded at $DIR/devvm-fallback.md"
|
||||
EOT
|
||||
]
|
||||
volume_mount {
|
||||
name = "workspace"
|
||||
mount_path = "/workspace"
|
||||
}
|
||||
resources {
|
||||
requests = { cpu = "10m", memory = "32Mi" }
|
||||
limits = { memory = "32Mi" }
|
||||
}
|
||||
}
|
||||
|
||||
# Main container: OpenClaw
|
||||
container {
|
||||
name = "openclaw"
|
||||
|
|
@ -511,6 +728,11 @@ resource "kubernetes_deployment" "openclaw" {
|
|||
# others hard-coded.
|
||||
# 4. gateway — exec into the gateway process
|
||||
command = ["sh", "-c", <<-EOC
|
||||
# Symlink /home/node/.ssh → persistent .ssh so the ssh client
|
||||
# finds id_rsa/config/known_hosts via $HOME/.ssh. HOME is
|
||||
# /home/node (image overlay), .ssh files live on the PVC
|
||||
# at /home/node/.openclaw/.ssh (set up by init 5).
|
||||
ln -sfn /home/node/.openclaw/.ssh /home/node/.ssh
|
||||
node openclaw.mjs doctor --fix 2>/dev/null
|
||||
node openclaw.mjs models set openai-codex/gpt-5.4-mini 2>/dev/null
|
||||
node openclaw.mjs mcp set ha "{\"url\":\"$HA_SOFIA_MCP_URL\",\"transport\":\"streamable-http\"}" 2>/dev/null
|
||||
|
|
@ -522,6 +744,10 @@ resource "kubernetes_deployment" "openclaw" {
|
|||
echo '{"plugins":{"allow":["memory-core","recruiter-api","telegram","openrouter","brave","openai","codex"]}}' \
|
||||
| node openclaw.mjs config patch --stdin 2>/dev/null || true
|
||||
node openclaw.mjs plugins enable recruiter-api 2>/dev/null || true
|
||||
# Reindex memory-core so the seeded devvm-fallback note (and
|
||||
# anything else dropped under /workspace/memory/) is searchable
|
||||
# on first boot; daily memory-sync CronJob also keeps it indexed.
|
||||
node openclaw.mjs memory index --force 2>/dev/null || true
|
||||
exec node openclaw.mjs gateway --allow-unconfigured --bind lan
|
||||
EOC
|
||||
]
|
||||
|
|
@ -544,8 +770,21 @@ resource "kubernetes_deployment" "openclaw" {
|
|||
value = random_password.gateway_token.result
|
||||
}
|
||||
env {
|
||||
name = "PATH"
|
||||
value = "/tools:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
|
||||
name = "PATH"
|
||||
# Host-tools bundle (installed by init 4: install-host-tools)
|
||||
# comes first so ssh/scp/dig/vault/jq/etc. resolve to the
|
||||
# extracted Debian binaries + the static-binary downloads.
|
||||
# /bin + /sbin are needed because iputils-ping installs ping
|
||||
# under /bin (not /usr/bin) on Debian.
|
||||
value = "/tools/host-tools/root/usr/bin:/tools/host-tools/root/usr/sbin:/tools/host-tools/root/bin:/tools/host-tools/root/sbin:/tools/host-tools/bin:/tools:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
|
||||
}
|
||||
env {
|
||||
# Point ld.so at the bundled libs so the host-tools binaries
|
||||
# find their shared-lib deps (libgssapi_krb5, libkrb5, etc.).
|
||||
# Both base images are bookworm so the libs match the
|
||||
# openclaw image's libc/libssl — no ABI conflicts expected.
|
||||
name = "LD_LIBRARY_PATH"
|
||||
value = "/tools/host-tools/root/usr/lib/x86_64-linux-gnu:/tools/host-tools/root/lib/x86_64-linux-gnu"
|
||||
}
|
||||
env {
|
||||
name = "TF_VAR_prod"
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue