infra/docs/adr/0011-homelab-usage-telemetry.md
Viktor Barzin 3e3fdb34f0
Some checks are pending
Build infra CLI / build (push) Waiting to run
ci/woodpecker/push/default Pipeline was successful
homelab: v0.6.0 — usage telemetry (usage top), evidence-driven verb prioritization
Answers the question that drove the whole CLI — which verbs to add next — with
data instead of one maintainer's habits, and resolves the cross-user-usage ask
in-bounds (no reading anyone's home).

- emit on dispatch: every verb fire-and-forgets one Loki line {job,user,verb} +
  "exit=N ver=X". ONLY the verb path + exit code — never args, paths, flags, or
  secrets (the emit never sees arguments). Best-effort: 800ms timeout, errors
  swallowed, never affects the command; opt-out HOMELAB_TELEMETRY=0. Discovery
  verbs (manifest/version/help) and usage itself don't self-record.
- usage top [--since 30d] [--user U] [--json]: ranks verbs via
  sum by (verb)(count_over_time({job="homelab-usage"}[…])) against the shared
  Loki. Cross-user analytics WITHOUT touching ~/.claude — the privacy-preserving
  answer to "what does the team use".
- Loki sink (zero new infra, dogfoods v0.5 logs path); push verified HTTP 204 no
  auth. ADR docs/adr/0011.

Live-verified: ran 4 verbs, usage top ranked them correctly (metrics query=2).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 22:29:01 +00:00

2.2 KiB

homelab usage telemetry: evidence-driven verb prioritization, privacy by construction

v0.6 adds usage top plus a fire-and-forget emit on every dispatched verb. It exists to answer the question that drove the whole CLI — which verbs are worth adding next — with data instead of one maintainer's habits (the earlier mining covered a single user's ~51k commands, so the surface is shaped to that user).

Decisions

  • Emit on dispatch, in dispatch(). The longest-prefix match already knows the verb path; after Run returns we emit {verb, exit}. Discovery verbs don't go through dispatch() (manifest/version/help are handled in dispatchTop), so they don't self-record; usage * is skipped explicitly so the analytics reader doesn't pollute its own data.
  • Payload is deliberately minimal: verb path + exit code only. Labels {job=homelab-usage, user, verb} (all low-cardinality) + line exit=N ver=X. No args, paths, flags, hostnames, or secrets ever leave the process — the emit sees only the matched verb name, not the arguments. This is what makes cross-user aggregation safe.
  • Shared Loki sink → cross-user analytics WITHOUT reading homes. Each user's CLI writes its own invocations (attributed to its OS user) to the shared Loki push API via the Traefik LB (verified: HTTP 204, no auth). usage top reads back with a LogQL metric query. This is the privacy-preserving resolution to "what does everyone (e.g. another user) use" — it never touches anyone's ~/.claude, which the org per-user policy bars (see the per-user red-line in managed-settings; reading another user's home is off-limits even for an owner in-session — a fresh session under changed MDM policy is the only legitimate path, and even then this telemetry is the better answer).
  • Best-effort, never affects the command. All errors swallowed; an 800ms client timeout bounds the cost; opt-out via HOMELAB_TELEMETRY=0. Telemetry must never slow or break the tool it measures.
  • Loki, not a new datastore. Zero new infra, and it dogfoods the v0.5 logs path (same host, same LB dial). Presence MySQL was the alternative (queryable SQL) but would add a write dependency and creds; Loki needs neither.