homelab: v0.6.0 — usage telemetry (usage top), evidence-driven verb prioritization

Answers the question that drove the whole CLI — which verbs to add next — with data instead of one maintainer's habits, and resolves the cross-user-usage ask in-bounds (no reading anyone's home). - emit on dispatch: every verb fire-and-forgets one Loki line {job,user,verb} + "exit=N ver=X". ONLY the verb path + exit code — never args, paths, flags, or secrets (the emit never sees arguments). Best-effort: 800ms timeout, errors swallowed, never affects the command; opt-out HOMELAB_TELEMETRY=0. Discovery verbs (manifest/version/help) and usage itself don't self-record. - usage top [--since 30d] [--user U] [--json]: ranks verbs via sum by (verb)(count_over_time({job="homelab-usage"}[…])) against the shared Loki. Cross-user analytics WITHOUT touching ~/.claude — the privacy-preserving answer to "what does the team use". - Loki sink (zero new infra, dogfoods v0.5 logs path); push verified HTTP 204 no auth. ADR docs/adr/0011. Live-verified: ran 4 verbs, usage top ranked them correctly (metrics query=2). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 22:29:01 +00:00 · 2026-06-19 22:29:01 +00:00 · 3e3fdb34f0
commit 3e3fdb34f0
parent 666fefd22b
9 changed files with 215 additions and 4 deletions
--- a/docs/adr/0011-homelab-usage-telemetry.md
+++ b/docs/adr/0011-homelab-usage-telemetry.md
@ -0,0 +1,34 @@
+# homelab usage telemetry: evidence-driven verb prioritization, privacy by construction
+
+v0.6 adds `usage top` plus a fire-and-forget emit on every dispatched verb. It
+exists to answer the question that drove the whole CLI — *which verbs are worth
+adding next* — with data instead of one maintainer's habits (the earlier mining
+covered a single user's ~51k commands, so the surface is shaped to that user).
+
+## Decisions
+
+- **Emit on dispatch, in `dispatch()`.** The longest-prefix match already knows
+  the verb path; after `Run` returns we emit `{verb, exit}`. Discovery verbs
+  don't go through `dispatch()` (`manifest`/`version`/`help` are handled in
+  `dispatchTop`), so they don't self-record; `usage *` is skipped explicitly so
+  the analytics reader doesn't pollute its own data.
+- **Payload is deliberately minimal: verb path + exit code only.** Labels
+  `{job=homelab-usage, user, verb}` (all low-cardinality) + line `exit=N ver=X`.
+  **No args, paths, flags, hostnames, or secrets** ever leave the process — the
+  emit sees only the matched verb name, not the arguments. This is what makes
+  cross-user aggregation safe.
+- **Shared Loki sink → cross-user analytics WITHOUT reading homes.** Each user's
+  CLI writes its own invocations (attributed to its OS user) to the shared Loki
+  push API via the Traefik LB (verified: HTTP 204, no auth). `usage top` reads
+  back with a LogQL metric query. This is the privacy-preserving resolution to
+  "what does everyone (e.g. another user) use" — it never touches anyone's
+  `~/.claude`, which the org per-user policy bars (see the per-user red-line in
+  managed-settings; reading another user's home is off-limits even for an owner
+  in-session — a fresh session under changed MDM policy is the only legitimate
+  path, and even then this telemetry is the better answer).
+- **Best-effort, never affects the command.** All errors swallowed; an 800ms
+  client timeout bounds the cost; opt-out via `HOMELAB_TELEMETRY=0`. Telemetry
+  must never slow or break the tool it measures.
+- **Loki, not a new datastore.** Zero new infra, and it dogfoods the v0.5 `logs`
+  path (same host, same LB dial). Presence MySQL was the alternative (queryable
+  SQL) but would add a write dependency and creds; Loki needs neither.