infra/docs/adr/0012-homelab-ha-verbs.md
Viktor Barzin b1bbe42821
Some checks are pending
Build infra CLI / build (push) Waiting to run
ci/woodpecker/push/default Pipeline was successful
homelab ha token: dedicated openclaw/ha-tokens secret + least-priv RBAC for emo
`ha token` originally read openclaw/openclaw-secrets -> skill_secrets, which only
cluster admins can read — so it hung/failed for the non-admin operator it was
built for (emo = emil.barzin@gmail.com, OIDC group "Home Server Admins", whose
identity is deliberately barred from secrets in the openclaw namespace).

Split the HA tokens into a dedicated secret openclaw/ha-tokens (keys sofia/london)
with a Role + RoleBinding granting `get` on JUST that secret to the Home Server
Admins group (k8s RBAC can't scope to a JSON sub-key, hence a separate object).
emo now resolves the HA token with their own identity, WITHOUT gaining the rest
of skill_secrets (slack_webhook, uptime_kuma_password). openclaw's own deployment
keeps reading openclaw-secrets — purely additive.

- stacks/openclaw/ha_tokens.tf: new secret + least-privilege Role/RoleBinding
- cli/cmd_ha.go: read openclaw/ha-tokens (raw base64 per-instance key); drop JSON parse
- README + ADR-0012 updated; VERSION -> v0.7.1

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 10:45:32 +00:00

3.7 KiB
Raw Blame History

homelab Home Assistant verbs: token resolution + host SSH, not entity control

v0.7 adds ha token and ha ssh. They were chosen by mining a heavy HA operator's sessions: across ~1,900 shell commands the single most-repeated line (420×) was a hand-rolled kubectl … | base64 -d | python -c '…token' pipeline, and a bespoke ssh -o StrictHostKeyChecking=no -o … invocation was redefined as a shell function ~30× — both re-derived from scratch every session. The existing home-assistant-sofia.py already covers the API, but it goes unused from an arbitrary cwd (it needs HOME_ASSISTANT_SOFIA_TOKEN set and is referenced by a cwd-relative path), so agents bypassed it. A global verb on $PATH closes that gap for every user in every directory.

Decisions

  • Only the two gaps the ha MCP can't fill. The ha MCP server already does entity state and control (get_state, call_service, history, logs). Per the CLI's founding rule — MCP-encoded actions are out of scope (ADR-0004) — we do not reimplement on/off/list/state. We add only token resolution and host SSH, neither of which an API-only MCP can provide. The value is endpoint/secret/host resolution, exactly like net/dns (ADR-0010).
  • ha token resolves live from the cluster, not from an env var. It reads the dedicated k8s Secret openclaw/ha-tokens (one key per instance: sofia / london) via the ambient kubeconfig. This is robust to env drift — the precise failure that made agents re-derive the pipeline. Read-tier, prints the bare token to stdout so it composes in $(…), mirroring memory secret.
  • The token is split into its own least-privilege secret (stacks/openclaw/ha_tokens.tf). It was originally read from openclaw-secretsskill_secrets (a JSON blob also holding slack_webhook + uptime_kuma_password), which only cluster admins can read — so the verb hung/failed for the non-admin operator it was built for (emo = emil.barzin@gmail.com, group Home Server Admins, whose OIDC identity is barred from secrets in openclaw). ha-tokens carries only the HA tokens, with a Role+RoleBinding granting get on just that secret to the Home Server Admins group (k8s RBAC can't scope to a JSON sub-key, hence the separate object). openclaw's own deployment keeps reading openclaw-secrets — this is purely additive.
  • ha ssh is deterministic and per-user. Flags are fixed for unattended use: -F /dev/null (ignore user ssh-config), StrictHostKeyChecking=no + UserKnownHostsFile=/dev/null (no host-key prompt/record — agents have no TTY), BatchMode=yes + ConnectTimeout=10 (fail fast, never hang). The key is the invoking user's ~/.ssh/id_ed25519, so the verb isn't tied to whoever first wrote the workflow; that user's key must be enrolled on the HA host. Write-tier (runs an arbitrary remote command).
  • sofia is the default; london is structural. The devvm sits on the Sofia LAN, so vbarzin@192.168.1.8 is reachable and is the default instance. london (hassio@192.168.8.103) is in the instance map so ha token --instance london works (a pure secret read), but ha ssh --instance london generally won't connect from here — london is remote. We model it correctly rather than pretend it's reachable.
  • Scope held at two verbs. ha api (an authenticated curl passthrough for the endpoints the MCP/script don't cover — /api/template, /reload, check_config, /error_log) was deferred: once ha token exists, raw curl is already unblocked, and a generic passthrough overlaps the MCP. Re-measure via usage top (ADR-0011); add targeted sugar verbs only if those endpoints are still hand-rolled often.