infra/docs/adr/0012-homelab-ha-verbs.md
Viktor Barzin b1bbe42821
Some checks are pending
Build infra CLI / build (push) Waiting to run
ci/woodpecker/push/default Pipeline was successful
homelab ha token: dedicated openclaw/ha-tokens secret + least-priv RBAC for emo
`ha token` originally read openclaw/openclaw-secrets -> skill_secrets, which only
cluster admins can read — so it hung/failed for the non-admin operator it was
built for (emo = emil.barzin@gmail.com, OIDC group "Home Server Admins", whose
identity is deliberately barred from secrets in the openclaw namespace).

Split the HA tokens into a dedicated secret openclaw/ha-tokens (keys sofia/london)
with a Role + RoleBinding granting `get` on JUST that secret to the Home Server
Admins group (k8s RBAC can't scope to a JSON sub-key, hence a separate object).
emo now resolves the HA token with their own identity, WITHOUT gaining the rest
of skill_secrets (slack_webhook, uptime_kuma_password). openclaw's own deployment
keeps reading openclaw-secrets — purely additive.

- stacks/openclaw/ha_tokens.tf: new secret + least-privilege Role/RoleBinding
- cli/cmd_ha.go: read openclaw/ha-tokens (raw base64 per-instance key); drop JSON parse
- README + ADR-0012 updated; VERSION -> v0.7.1

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 10:45:32 +00:00

54 lines
3.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# homelab Home Assistant verbs: token resolution + host SSH, not entity control
v0.7 adds `ha token` and `ha ssh`. They were chosen by mining a heavy HA
operator's sessions: across ~1,900 shell commands the single most-repeated line
(420×) was a hand-rolled `kubectl … | base64 -d | python -c '…token'` pipeline,
and a bespoke `ssh -o StrictHostKeyChecking=no -o …` invocation was redefined as
a shell function ~30× — both re-derived from scratch every session. The existing
`home-assistant-sofia.py` already covers the *API*, but it goes unused from an
arbitrary cwd (it needs `HOME_ASSISTANT_SOFIA_TOKEN` set and is referenced by a
cwd-relative path), so agents bypassed it. A global verb on `$PATH` closes that
gap for every user in every directory.
## Decisions
- **Only the two gaps the `ha` MCP can't fill.** The `ha` MCP server already
does entity state and control (`get_state`, `call_service`, history, logs).
Per the CLI's founding rule — *MCP-encoded actions are out of scope* (ADR-0004)
— we do **not** reimplement `on`/`off`/`list`/`state`. We add only token
*resolution* and host *SSH*, neither of which an API-only MCP can provide. The
value is endpoint/secret/host resolution, exactly like `net`/`dns` (ADR-0010).
- **`ha token` resolves live from the cluster, not from an env var.** It reads
the dedicated k8s Secret `openclaw/ha-tokens` (one key per instance: `sofia` /
`london`) via the ambient kubeconfig. This is robust to env drift — the precise
failure that made agents re-derive the pipeline. Read-tier, prints the bare
token to stdout so it composes in `$(…)`, mirroring `memory secret`.
- **The token is split into its own least-privilege secret** (`stacks/openclaw/ha_tokens.tf`).
It was originally read from `openclaw-secrets``skill_secrets` (a JSON blob
also holding `slack_webhook` + `uptime_kuma_password`), which only cluster
admins can read — so the verb hung/failed for the non-admin operator it was
built for (emo = `emil.barzin@gmail.com`, group `Home Server Admins`, whose
OIDC identity is barred from secrets in `openclaw`). `ha-tokens` carries only
the HA tokens, with a Role+RoleBinding granting `get` on *just that secret* to
the `Home Server Admins` group (k8s RBAC can't scope to a JSON sub-key, hence
the separate object). openclaw's own deployment keeps reading `openclaw-secrets`
— this is purely additive.
- **`ha ssh` is deterministic and per-user.** Flags are fixed for unattended
use: `-F /dev/null` (ignore user ssh-config), `StrictHostKeyChecking=no` +
`UserKnownHostsFile=/dev/null` (no host-key prompt/record — agents have no
TTY), `BatchMode=yes` + `ConnectTimeout=10` (fail fast, never hang). The key
is the **invoking user's** `~/.ssh/id_ed25519`, so the verb isn't tied to
whoever first wrote the workflow; that user's key must be enrolled on the HA
host. Write-tier (runs an arbitrary remote command).
- **sofia is the default; london is structural.** The devvm sits on the Sofia
LAN, so `vbarzin@192.168.1.8` is reachable and is the default instance. london
(`hassio@192.168.8.103`) is in the instance map so `ha token --instance london`
works (a pure secret read), but `ha ssh --instance london` generally won't
connect from here — london is remote. We model it correctly rather than
pretend it's reachable.
- **Scope held at two verbs.** `ha api` (an authenticated curl passthrough for
the endpoints the MCP/script don't cover — `/api/template`, `/reload`,
`check_config`, `/error_log`) was deferred: once `ha token` exists, raw curl is
already unblocked, and a generic passthrough overlaps the MCP. Re-measure via
`usage top` (ADR-0011); add targeted sugar verbs only if those endpoints are
still hand-rolled often.