Give the OpenClaw pod two new capabilities:
1. Host-tools bundle. New init container `install-host-tools` extracts
openssh-client + dnsutils + tmux + jq + ripgrep + fd + vault + yq +
friends into /tools/host-tools/, with the bookworm-slim libs the
binaries need. PATH + LD_LIBRARY_PATH on the main container point
ld.so at the bundle. Idempotent via /tools/host-tools/.installed-v1
marker; smoke test (ldd-based) fails the init at deploy time if any
binary has unresolved deps. Bundle is ~558 MB on the existing
/srv/nfs/openclaw/tools NFS.
2. devvm SSH + async task pattern. New init `setup-ssh-config` writes
id_rsa/config/known_hosts under /home/node/.openclaw/.ssh; main
container startup symlinks /home/node/.ssh → there. New
/usr/local/bin/openclaw-task wrapper on devvm manages long-running
work as tmux sessions on devvm (sessions and logs survive pod
restarts — they live on devvm, not in the pod). New init container
`seed-devvm-memory-note` drops a markdown note teaching the pattern;
main container startup now runs `openclaw memory index --force` so
the note is searchable on first boot.
Design + verified E2E flow in
docs/plans/2026-05-22-openclaw-devvm-access-design.md. Persistence test
green: spawned a 50s task from pod A, deleted pod A, new pod B saw the
task finish and read its full log.
Pre-existing keel.sh annotation drift on openclaw/{openlobster,
task_webhook} cleaned up in the same apply.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
12 KiB
OpenClaw devvm access + async task pattern — design
Date: 2026-05-22
Stack: infra/stacks/openclaw
Status: Approved (in-session, see chat history 2026-05-22)
Goal
Give the OpenClaw pod (running in K8s) two new capabilities:
- Host-tools bundle — common Linux CLIs the upstream OpenClaw image
doesn't ship (
ssh,scp,vault,dig,jq,yq,ripgrep,fd,gnupg,tmux, etc.). OpenClaw can'tapt installbecause the container runs as non-rootnode(uid 1000). - devvm async task pattern — OpenClaw spawns long-running work as
tmuxsessions on devvm, sends prompts viatmux send-keys, captures progress viatmux capture-pane. Sessions live on devvm, so they survive OpenClaw pod restarts.
OpenClaw uses this combination as a trusted fallback for tasks too
expensive, sensitive, or stateful for in-pod execution: Vault lookups,
multi-step claude-code work, anything needing wizard's full home-lab
access.
Why now
- The in-pod sandbox is
security=fullbut the container is minimal — nossh, novault, nodig, notmux. - The user wants OpenClaw to be a first-line agent that delegates heavy work to the dev VM rather than duplicate that work in a constrained pod.
- Long-running work (multi-minute
claude-codesessions) shouldn't be tied to a single synchronousclaude -pinvocation — needs persistence and pollability.
Architecture decision: stay on K8s
Discussed migrating OpenClaw to run directly on devvm (would obviate the host-tools bundle + most of the SSH setup). Decision: stay on K8s.
Reasons:
- Keeps HA (5-node cluster vs single devvm reboot)
- Keeps ingress/Authentik/Telegram entry chain intact
- Keeps Prometheus scrape + exporter sidecar
- Keeps PVC backup pipeline (LVM snapshots + Synology offsite)
- Resource isolation — a runaway LLM session can't stress wizard's daily-driver VM
- Migration cost is several days; this design is ~150 LoC + an 80-line wrapper
The mental model — "OpenClaw is sandboxed, delegates to wizard@devvm for trusted heavy lifting" — is a clean security boundary. Worth preserving.
Architecture
Pod side (infra/stacks/openclaw/main.tf)
Two new init containers added to the OpenClaw Deployment, after the existing four:
Init 5 — install-host-tools
- Image:
debian:bookworm-slim(matches main container base for glibc compat) - Idempotent: skips if
/tools/host-tools/.installed-v1exists apt-get install --download-only --no-install-recommendsfor:openssh-client dnsutils iputils-ping wget gnupg jq ripgrep fd-find ncdu htop strace tcpdump tmux unzip- Iterates
.debfiles in/var/cache/apt/archives/,dpkg-deb -xeach into/tools/host-tools/root/(preservesusr/bin,usr/sbin,usr/liblayout) - Downloads static binaries to
/tools/host-tools/bin/:vault(HashiCorp releases, pinned version)yq(mikefarah/yq GitHub releases, pinned version)
- Smoke test: invokes
--versionon each bundled binary; fails init if any won't load (catches glibc / shared-lib drift at deploy time, not runtime) - Writes marker file with version
Init 6 — setup-ssh-config
-
Image: uses the just-installed host-tools (debian:bookworm-slim base with
/tools/host-tools/root/usr/binon PATH sossh-keyscanworks) -
Runs after
install-host-tools -
Idempotent: skips if
/home/node/.openclaw/.ssh/.configured-v1exists -
Creates
/home/node/.openclaw/.ssh/(uid 1000) -
Copies
/ssh/id_rsa(tmpfs secret mount) →~/.ssh/id_rsawith 0600 (the secret tmpfs mount has wider perms that openssh rejects) -
Writes
~/.ssh/config:Host devvm HostName 10.0.10.10 User wizard IdentityFile ~/.ssh/id_rsa UserKnownHostsFile ~/.ssh/known_hosts StrictHostKeyChecking yesPATH handling on the remote side: devvm's sshd uses the default non-interactive PATH (
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin) and does NOT load~/.profileor~/.bashrc(memory id=740). Client-sideSetEnv PATH=…doesn't help because sshd'sAcceptEnvisLANG LC_*only. Solution: install the binaries openclaw cares about into/usr/local/bin/on devvm (see "Devvm side" below). -
Pre-seeds
~/.ssh/known_hostsviassh-keyscan -H 10.0.10.10 -
Writes marker file
Main container
PATHenv updated: prepend/tools/host-tools/root/usr/bin:/tools/host-tools/root/usr/sbin:/tools/host-tools/bin- No other changes to the startup command
Devvm side
/usr/local/bin/openclaw-task wrapper
Canonical source: infra/stacks/openclaw/files/openclaw-task.sh.
Installed to devvm at /usr/local/bin/openclaw-task (sudo cp, sudo chmod +x) so non-interactive SSH finds it on the default PATH without
needing ~/.profile. Updates: re-run the install steps from the
canonical source.
Also: sudo ln -s /home/wizard/.local/bin/claude /usr/local/bin/claude
so ssh devvm claude … works in non-interactive mode. vault and tmux
are already at /usr/bin/ (system packages) so no symlink needed for
those.
POSIX shell script. Subcommands:
| Subcommand | Behavior |
|---|---|
new <id> <cmd...> |
Spawns detached tmux session openclaw-task-<id>, pipes pane output to ~/openclaw-tasks/<id>.log |
claude <id> <prompt> |
Convenience: spawns interactive claude in a tmux session, send-keys the prompt + Enter |
send <id> <keys...> |
tmux send-keys -t openclaw-task-<id> "$@" — caller supplies Enter literal if needed |
capture <id> [lines] |
tmux capture-pane -t … -p -S -<lines> (default last 1000) |
log <id> |
cat ~/openclaw-tasks/<id>.log |
tail <id> |
tail -n 100 -f ~/openclaw-tasks/<id>.log (mainly for human ops) |
list |
tmux session list filtered to openclaw-task-*, one id per line |
status <id> |
running if tmux session alive, ended otherwise |
kill <id> |
tmux kill-session -t openclaw-task-<id> (log file is kept) |
purge <id> |
kill + rm -f ~/openclaw-tasks/<id>.log |
Task state lives entirely on devvm:
- tmux sessions persist across SSH disconnects and OpenClaw pod restarts
~/openclaw-tasks/<id>.logis the durable transcript even after a session is killed- No central database —
tmux list-sessionsis the source of truth for "what's running"
Naming convention: tmux sessions are prefixed openclaw-task- so they
don't collide with wizard's own tmux work (0, Openclaw, read-only).
Memory note
File at /workspace/memory/projects/openclaw-runtime/devvm-fallback.md
teaching OpenClaw the pattern. Indexed by the existing daily
memory-sync CronJob (or via manual node openclaw.mjs memory index --force for the initial seed).
Content (verbatim):
# Using devvm as a fallback
When in-pod tools/permissions block you, SSH to devvm and use it. The
devvm runs as wizard with full home-lab access (Vault, kubectl, git
repos, Cloudflare, etc.) and has Claude Code v2+ installed.
## One-shot lookup
ssh devvm 'vault kv get -field=brave_api_key secret/openclaw'
ssh devvm 'claude -p "investigate why frigate is restarting"'
## Long-running async work — USE THIS for anything > ~2 min
Spawn in a tmux session on devvm. Sessions survive OpenClaw pod restarts.
# spawn
ssh devvm openclaw-task new my-task "claude -p --dangerously-skip-permissions 'do the thing'"
# poll progress (last 1000 lines of pane)
ssh devvm openclaw-task capture my-task
# interactive claude (send follow-up prompts)
ssh devvm openclaw-task claude my-task "initial prompt"
ssh devvm openclaw-task send my-task "follow-up prompt" Enter
# housekeeping
ssh devvm openclaw-task list
ssh devvm openclaw-task status my-task
ssh devvm openclaw-task kill my-task
Logs persist at ~/openclaw-tasks/<id>.log on devvm even after a session
is killed. Use `ssh devvm openclaw-task log <id>` to retrieve them.
Devvm: no infra changes
Pre-existing state verified 2026-05-22:
- pubkey from
/ssh/id_rsa(Vaultsecret/openclaw → ssh_key) matches thessh-ed25519 AAAA…lug node@openclaw-58cd9f7987-884bvline in~/.ssh/authorized_keys(the comment is a stale pod name; the key itself is stable from Vault) - sshd listens on 0.0.0.0:22 ✓
claudev2.1.126 at/home/wizard/.local/bin/claude✓tmux3.4 installed, server already running with existing user sessions ✓
Only changes (one-time, done in the same session via sudo):
- Install
openclaw-taskwrapper to/usr/local/bin/openclaw-task - Symlink
/home/wizard/.local/bin/claude→/usr/local/bin/claude
Tradeoffs / risks
- Bundle size on NFS: ~30MB extracted. Acceptable on
/srv/nfs/openclaw/tools. - Library version drift: bundled binaries link against bookworm libs.
Smoke test in
install-host-toolscatches breakage on the next pod restart if upstream OpenClaw image rebases. - Full-shell SSH: explicit user choice. Blast radius if openclaw is
prompt-injected = full wizard access. Mitigation: keep OpenClaw's
plugin allowlist tight (current allow list:
memory-core, recruiter-api, telegram, openrouter, brave, openai, codex). - tmux server lifecycle on devvm: if wizard's tmux server dies (rare — usually only on devvm reboot), in-flight openclaw tasks are killed. Acceptable for home lab. Task logs persist regardless.
- Task log unbounded growth:
~/openclaw-tasks/*.loggrows forever. Out of scope here. User can add afind -mtime +N -deletecron later. - Init container order:
setup-ssh-configdepends oninstall-host-toolsfinishing first. K8s init containers run sequentially in declaration order — natural ordering, no explicit dependency mechanism needed.
Testing — E2E flows required by user
- Tools present:
kubectl -n openclaw exec <pod> -c openclaw -- ssh -Vreturns version, same fordig,vault,jq,yq,tmux,rg. - SSH happy path:
kubectl -n openclaw exec <pod> -c openclaw -- ssh devvm 'hostname'returnsdevvm. - Claude one-shot:
kubectl -n openclaw exec <pod> -c openclaw -- ssh devvm 'claude -p "what is 1+1"'returns2. - Async task lifecycle:
ssh devvm openclaw-task new test-1 "sleep 30; echo done"ssh devvm openclaw-task listcontainstest-1ssh devvm openclaw-task status test-1returnsrunning- wait 35s
ssh devvm openclaw-task log test-1containsdonessh devvm openclaw-task status test-1returnsended
- Persistence test (the key requirement):
- Spawn long task:
ssh devvm openclaw-task new persist-1 "sleep 120; echo survived > /tmp/persist-1.proof" kubectl -n openclaw delete pod <openclaw-pod>— pod recreated- Wait for new pod ready (init containers run, skip via marker, fast)
kubectl -n openclaw exec <new-pod> -c openclaw -- ssh devvm openclaw-task listcontainspersist-1- Wait for original sleep to finish; verify
/tmp/persist-1.proofcontainssurvivedfrom new pod
- Spawn long task:
- Memory note lookup:
kubectl -n openclaw exec <pod> -c openclaw -- node openclaw.mjs memory search 'devvm fallback'returns the note.
Docs to update with the change
infra/docs/plans/2026-05-22-openclaw-devvm-access-design.md(this doc)infra/docs/plans/2026-05-22-openclaw-devvm-access-plan.md(implementation plan)infra/.claude/reference/service-catalog.md(one-line addition under OpenClaw: "Has SSH to devvm with host-tools bundle; long-running async tasks viaopenclaw-taskwrapper on devvm")infra/.claude/CLAUDE.md"Known Issues" section is left alone — none of the existing OpenClaw caveats change.