infra/cli
Viktor Barzin 48225f2dea homelab CLI v0.7: add ha token + ha ssh for Home Assistant
Mined another devvm user's Claude sessions for repeated, hand-rolled command
patterns worth absorbing into the shared CLI. The dominant signal was Home
Assistant "Sofia" work: a `kubectl | base64 | jq` token-extraction pipeline
re-derived ~420x, and a bespoke non-interactive `ssh -o …` invocation reinvented
~30x — every session. The existing `home-assistant-sofia.py` already covers the
API but goes unused from an arbitrary cwd (needs an env var set + a cwd-relative
path), so agents bypassed it and hand-rolled everything.

Add two verbs covering exactly the gaps the `ha` MCP can't (entity state/control
stays with the MCP):
- `ha token [--instance sofia|london]` (read): resolves the long-lived API token
  live from k8s secret openclaw/openclaw-secrets via the ambient kubeconfig — no
  pre-set env var. Composes as `curl -H "Authorization: Bearer $(homelab ha token)"`.
- `ha ssh [--instance sofia|london] -- <cmd>` (write): deterministic
  non-interactive ssh to the HA host using the invoking user's key.

Also fix the root cause: `home-assistant-sofia.py` now falls back to
`homelab ha token` when its env var is unset (works from any directory), and the
home-assistant skill points agents at these verbs + `homelab metrics query`
instead of hand-rolled curls. README + ADR-0012 + AGENTS.md updated per the
per-verb-group convention.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 23:46:09 +00:00
..
cmd_ci.go homelab: v0.4.0 — ci/deploy verbs (watch what you trigger) 2026-06-19 10:59:14 +00:00
cmd_claim.go homelab: scaffold unified CLI (registry, manifest, claim/release) in infra/cli 2026-06-18 19:12:57 +00:00
cmd_deploy.go homelab: v0.4.0 — ci/deploy verbs (watch what you trigger) 2026-06-19 10:59:14 +00:00
cmd_ha.go homelab CLI v0.7: add ha token + ha ssh for Home Assistant 2026-06-20 23:46:09 +00:00
cmd_ha_test.go homelab CLI v0.7: add ha token + ha ssh for Home Assistant 2026-06-20 23:46:09 +00:00
cmd_k8s.go homelab: v0.3.1 — fix k8s db PG target (resolve CNPG primary pod, not the Service) 2026-06-19 09:09:34 +00:00
cmd_memory.go homelab: add memory verb-group (v0.3.0) — direct claude-memory HTTP client 2026-06-19 05:56:25 +00:00
cmd_net.go homelab: v0.5.0 — net/dns/metrics/logs probes (endpoint resolution) 2026-06-19 11:27:31 +00:00
cmd_obs.go homelab: v0.5.0 — net/dns/metrics/logs probes (endpoint resolution) 2026-06-19 11:27:31 +00:00
cmd_tf.go homelab: add tf verbs + stack/git-crypt substrate 2026-06-18 19:16:33 +00:00
cmd_tf_test.go homelab: add tf verbs + stack/git-crypt substrate 2026-06-18 19:16:33 +00:00
cmd_usage.go homelab: v0.6.0 — usage telemetry (usage top), evidence-driven verb prioritization 2026-06-19 22:29:01 +00:00
cmd_work.go homelab: v0.4.0 — ci/deploy verbs (watch what you trigger) 2026-06-19 10:59:14 +00:00
cmd_work_test.go homelab: add work verbs (start/land/clean) with a land verification gate 2026-06-18 19:24:08 +00:00
command.go homelab: v0.6.0 — usage telemetry (usage top), evidence-driven verb prioritization 2026-06-19 22:29:01 +00:00
command_test.go homelab: scaffold unified CLI (registry, manifest, claim/release) in infra/cli 2026-06-18 19:12:57 +00:00
Dockerfile fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
email_alias.go fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
git.go fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
go.mod fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
go.sum fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
homelab.go homelab CLI v0.7: add ha token + ha ssh for Home Assistant 2026-06-20 23:46:09 +00:00
k8s.go homelab: v0.3.1 — fix k8s db PG target (resolve CNPG primary pod, not the Service) 2026-06-19 09:09:34 +00:00
k8s_test.go homelab: v0.3.1 — fix k8s db PG target (resolve CNPG primary pod, not the Service) 2026-06-19 09:09:34 +00:00
main.go homelab: scaffold unified CLI (registry, manifest, claim/release) in infra/cli 2026-06-18 19:12:57 +00:00
memory.go homelab: add memory verb-group (v0.3.0) — direct claude-memory HTTP client 2026-06-19 05:56:25 +00:00
memory_test.go homelab: add memory verb-group (v0.3.0) — direct claude-memory HTTP client 2026-06-19 05:56:25 +00:00
openwrt_dns.go fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
presence.go homelab: scaffold unified CLI (registry, manifest, claim/release) in infra/cli 2026-06-18 19:12:57 +00:00
presence_test.go homelab: scaffold unified CLI (registry, manifest, claim/release) in infra/cli 2026-06-18 19:12:57 +00:00
probe.go homelab: v0.5.0 — net/dns/metrics/logs probes (endpoint resolution) 2026-06-19 11:27:31 +00:00
probe_test.go homelab: v0.5.0 — net/dns/metrics/logs probes (endpoint resolution) 2026-06-19 11:27:31 +00:00
README.md homelab CLI v0.7: add ha token + ha ssh for Home Assistant 2026-06-20 23:46:09 +00:00
repo.go homelab: add work verbs (start/land/clean) with a land verification gate 2026-06-18 19:24:08 +00:00
repo_test.go homelab: add tf verbs + stack/git-crypt substrate 2026-06-18 19:16:33 +00:00
run.go homelab: add tf verbs + stack/git-crypt substrate 2026-06-18 19:16:33 +00:00
stack.go homelab: add tf verbs + stack/git-crypt substrate 2026-06-18 19:16:33 +00:00
stack_test.go homelab: add tf verbs + stack/git-crypt substrate 2026-06-18 19:16:33 +00:00
telemetry.go homelab: v0.6.0 — usage telemetry (usage top), evidence-driven verb prioritization 2026-06-19 22:29:01 +00:00
update_viktorbarzin_me.go homelab: scaffold unified CLI (registry, manifest, claim/release) in infra/cli 2026-06-18 19:12:57 +00:00
update_viktorbarzin_me_technitium.go fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
usage_test.go homelab: v0.6.0 — usage telemetry (usage top), evidence-driven verb prioritization 2026-06-19 22:29:01 +00:00
VERSION homelab CLI v0.7: add ha token + ha ssh for Home Assistant 2026-06-20 23:46:09 +00:00
vpn.go fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
woodpecker.go homelab: v0.4.0 — ci/deploy verbs (watch what you trigger) 2026-06-19 10:59:14 +00:00
woodpecker_test.go homelab: v0.4.0 — ci/deploy verbs (watch what you trigger) 2026-06-19 10:59:14 +00:00

homelab

homelab is the unified, agent-facing CLI for operating this homelab — one composable, JSON-capable surface for the operations agents run over and over, discovered progressively at runtime. It is grown in place from this directory (the former infra-cli), and the legacy webhook use-cases still work (see below).

It encodes actions, never judgment: methodology (debugging, TDD, review) and third-party/owned MCP servers (e.g. phpIPAM) are deliberately out of scope.

Usage

homelab <command> [args]
homelab manifest [--json]    # list every verb + its read/write tier (discovery entrypoint)
homelab version

v0.1 verbs — the infra inner-loop

Command Tier What it does
claim <kind>:<name> --purpose "…" write claim a shared resource on the presence board (wraps scripts/presence)
release <kind>:<name> write release a presence claim
tf plan <stack> read scripts/tg plan for a stack (resolved from cwd)
tf validate <stack> read scripts/tg validate
tf fmt <stack> read terraform fmt -recursive on the stack
tf force-unlock <stack> <lock-id> write release a stuck state lock
tf apply <stack> write scripts/tg apply — auto-claims stack:<name>, always releases, warns it's out-of-band
work start <topic> write create .worktrees/<topic> on <user>/<topic> off <remote>/master; enter with native EnterWorktree
work land [--verify-cmd "…"] [--no-verify] write merge master in → verify → push HEAD:master (non-ff retry; PR fallback)
work clean <topic> write remove a task's worktree + branch (run from the main checkout)

v0.2 verbs — Kubernetes

Built on an app→namespace→pod resolver: <app> defaults to the namespace (most namespaces hold one app); the target defaults to deploy/<app> and lets kubectl resolve the pod. Override with -n/--pod/-c/-l/--tty. Uses the ambient kubeconfig.

Command Tier What it does
k8s status [ns] read pods (wide) + recent non-Normal events (-A if no ns)
k8s get <ns> <resource> […] read kubectl -n <ns> get … passthrough
k8s logs <app> read logs for deploy/<app> (--tail default 200; -c/--previous/--since/-l)
k8s describe <app> [resource] read describe the deployment (or an explicit resource)
k8s debug <app> read one-shot triage: pods + workloads + describe + recent logs + events
k8s pf <app> <local:remote> [target] read port-forward to svc/<app> (or an explicit target)
k8s rollout-status <app> read rollout status deploy/<app>
k8s db <app> [--mysql] [--db N] -- "<SQL>" write exec into the dbaas DB (PG pg-cluster-rw, or MySQL with env-password wrapper)
k8s exec <app> [--tty] -- <cmd> write exec in the app's pod
k8s restart <app> write rollout restart deploy/<app> then wait for status
k8s rm-pod <name> -n <ns> [--job] [--force] write delete a stuck pod/job only

Config-mutation verbs (apply/edit/patch/scale/create) are intentionally not exposed — they stay raw kubectl, per the Terraform-only policy.

tf resolves the stack dir by walking up from cwd to the infra root and delegates to scripts/tg (which owns state decrypt/encrypt, the Vault lock, and the ingress auth-comment check). git-crypt filter flags are auto-injected on git operations in the encrypted infra repo.

work land refuses to push when it cannot verify (no --verify-cmd and no auto-detected suite) unless you pass --no-verify — landing to master unverified must be deliberate. After pushing it watches CI to green (ci watch on the landed commit) and fails if the pipeline does; pass --no-ci-watch to skip.

Tiers are recorded per verb so a future PreToolUse classifier can auto-allow reads / prompt writes; v0.1 allows everything and relies on existing gates (permission mode, presence claims, plan approval).

v0.3 verbs — memory

A thin HTTP client over the claude-memory service (the same backend the memory MCP wraps), authed with CLAUDE_MEMORY_API_KEY against CLAUDE_MEMORY_API_URL (the env the hooks already set; defaults to the ingress). Because it hits the HTTP API directly, it works even when the MCP frontend is down.

Command Tier What it does
memory recall "<context>" [--query --category --sort --limit] read semantic search (server-side ranking) — the navigate workhorse
memory list [--category --tag --limit] read recent memories
memory categories / memory tags / memory stats read enumerate the store
memory secret <id> read reveal a sensitive memory's content
memory store "<content>" [--category --tags --keywords --importance --sensitive] write store a memory
memory update <id> [--content --tags --importance] write edit a memory
memory delete <id> write delete a memory

All read/write paths are validated against the live API (incl. a store→recall→delete round-trip). This gives full data-plane parity with the MCP; the eventual deprecation (rewiring the per-prompt auto-recall + auto-learn hooks to the CLI, then uninstalling the MCP) is a separate, deliberate follow-up — see docs/adr/0008.

v0.4 verbs — ci / deploy

Watch what you trigger, without hand-rolling Woodpecker/kubectl polling. ci talks to the Woodpecker API (token from WOODPECKER_TOKEN or Vault secret/ci/global) via the internal Traefik LB, resolving the repo from the cwd remote, with retries that ride Woodpecker's intermittent empty responses.

Command Tier What it does
ci status [commit] read pipeline status for HEAD (or a commit)
ci watch [commit] read poll the pipeline to terminal; exit non-zero on failure
deploy wait <ns>/<deploy> [--sha SHA] read wait for the deployment image to match the sha, then rollout status (rollout status alone lies on the old ReplicaSet)

work land now calls ci watch on the landed commit automatically (skip with --no-ci-watch), closing the v0.1 "doesn't wait for CI" gap. ci logs (failing step) is deferred to v0.4.1 — Woodpecker's per-pipeline detail/log endpoints were the least reliable; status/watch use the list endpoint that works.

v0.5 verbs — net / dns / metrics / logs

Reachability + observability probes. Their value is endpoint resolution — the non-obvious "which host, public or LB, what auth, what URL shape" reasoning you'd otherwise re-derive every time — not the HTTP call itself. All reach internal ingresses through the Traefik LB (the Go form of curl --resolve host:443:10.0.20.203).

Command Tier What it does
net check <host> [path] read probes the host two ways — external (public DNS → Cloudflare) vs internal (Traefik LB) — with status + latency, so you can tell where a break is (CF? app? the LB path?)
dns lookup <name> [type] read resolves via Technitium (10.0.20.201) and public (1.1.1.1), diffed — surfaces split-horizon vs propagation gaps
metrics query "<promql>" read Prometheus instant query (prometheus-query.viktorbarzin.lan); prints value {labels} or --json
metrics alerts read currently-firing alerts (via the synthetic ALERTS series — the query frontend has no /api/v1/alerts)
logs query "<logql>" [--since 1h] [--limit N] read Loki range query (loki.viktorbarzin.lan); prints log lines or --json

Quote the PromQL/LogQL. These hit auth-free internal ingresses — no port-forward, no kubectl. (In-cluster-only endpoints like Alertmanager stay out of scope; the firing set is reachable via ALERTS instead.)

v0.6 — usage telemetry (usage top)

Makes "which verbs are actually used, by everyone" a query instead of a guess — so adding the next verb is evidence-driven, not shaped by one person's habits.

Every dispatched verb emits one fire-and-forget Loki line: {job, user, verb} labels + exit=N ver=Xonly the verb path and exit code, never args, paths, flags, or secrets. It's best-effort (tight timeout, errors swallowed, never affects the command) and opt-out via HOMELAB_TELEMETRY=0. Because the sink is the shared Loki, aggregate usage is queryable without reading anyone's home — the privacy-preserving answer to "what does the team use."

Command Tier What it does
usage top [--since 30d] [--user U] [--json] read rank verbs by invocation count across all users (or one), via sum by (verb) (count_over_time({job="homelab-usage"}[…]))

v0.7 verbs — Home Assistant

Cover exactly the two things the ha MCP server can't: resolving the long-lived API token out of the cluster, and SSH to the HA host for host-level work (config files, docker, add-ons). Entity state and control (turn_on, get_state, services) stay with the MCP — actions an MCP already encodes are out of scope (see top of this doc). The value here is the same as net/dns: the non-obvious which secret, which host, which key, which flags you'd otherwise re-derive every session — agents were hand-rolling a kubectl | base64 | jq token pipeline and a bespoke ssh -o … invocation on every run because the existing home-assistant-sofia.py needs an env var set and a cwd-relative path, neither of which holds in an arbitrary session.

Command Tier What it does
ha token [--instance sofia|london] read print the long-lived HA API token, resolved live from k8s Secret openclaw/openclaw-secrets (skill_secrets JSON) via the ambient kubeconfig — no pre-set env var. Use as curl -H "Authorization: Bearer $(homelab ha token)" …
ha ssh [--instance sofia|london] [-i KEY] -- <cmd> write run <cmd> on the HA host over ssh with deterministic non-interactive flags (explicit key = the invoking user's ~/.ssh/id_ed25519, no user ssh-config, no known_hosts prompt). sofia (vbarzin@192.168.1.8) is reachable from the devvm LAN; london is documented but generally remote

--instance defaults to sofia (the devvm shares the Sofia LAN). ha token prints the bare token to stdout so it composes in $(…); it's read-tier like memory secret. ha ssh resolves the invoking user's key, so it's per-user, not tied to whoever first wrote the workflow (the user's key must be enrolled on the HA host).

Build / install

Built from source to /usr/local/bin/homelab during devvm provisioning (scripts/workstation/setup-devvm.sh, the t3-dispatch pattern); version is stamped from cli/VERSION via ldflags. Manual build:

cd cli && go build -ldflags "-X main.version=$(cat VERSION)" -o /usr/local/bin/homelab .
go test ./...

Legacy webhook use-cases (preserved)

This binary is also the in-cluster infra-cli image. Invocations starting with -use-case=<vpn|setup-openwrt-dns|add-email-alias|...> fall through to the original flag-based path unchanged, so the webhook handler is unaffected.

Design

See infra/docs/adr/00040012 for the architecture decisions.