infra/docs/adr/0011-homelab-usage-telemetry.md
Viktor Barzin 8121d8a4ac
All checks were successful
ci/woodpecker/push/default Pipeline was successful
docs(adr): add ADR-0015 (OS/sudo is the authorization boundary), supersede ADR-0011 privacy norm
Viktor (owner) wants agents to stop refusing file reads the OS already permits. wizard holds passwordless root ((ALL) NOPASSWD: ALL), so the managed-settings rule 'never read another user's ~/.claude' was stricter than the OS itself. The managed-settings policy (/etc/claude-code/managed-settings.json) was updated out-of-band to defer to OS/sudo authorization with no extra prompt; backup kept at .bak-2026-06-26. This ADR records the decision, its symmetry across sudo-holders, and the larger blast radius.

ADR-0011's usage-telemetry design is unchanged; only the cross-user privacy norm it referenced is superseded. The original ask was to delete ADR-0011 — superseded instead to preserve the audit trail and the ADR-0012/0013 references.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 08:22:29 +00:00

2.7 KiB

homelab usage telemetry: evidence-driven verb prioritization, privacy by construction

v0.6 adds usage top plus a fire-and-forget emit on every dispatched verb. It exists to answer the question that drove the whole CLI — which verbs are worth adding next — with data instead of one maintainer's habits (the earlier mining covered a single user's ~51k commands, so the surface is shaped to that user).

Update (2026-06-26) — the cross-user privacy norm below is superseded by ADR-0015. The prohibition this ADR leaned on ("reading another user's ~/.claude is off-limits even for an owner in-session") no longer holds: the managed-settings policy now defers to OS/sudo authorization. The usage top telemetry design itself is unchanged and still current — only the "never read homes" framing in the third decision below is overtaken.

Decisions

  • Emit on dispatch, in dispatch(). The longest-prefix match already knows the verb path; after Run returns we emit {verb, exit}. Discovery verbs don't go through dispatch() (manifest/version/help are handled in dispatchTop), so they don't self-record; usage * is skipped explicitly so the analytics reader doesn't pollute its own data.
  • Payload is deliberately minimal: verb path + exit code only. Labels {job=homelab-usage, user, verb} (all low-cardinality) + line exit=N ver=X. No args, paths, flags, hostnames, or secrets ever leave the process — the emit sees only the matched verb name, not the arguments. This is what makes cross-user aggregation safe.
  • Shared Loki sink → cross-user analytics WITHOUT reading homes. Each user's CLI writes its own invocations (attributed to its OS user) to the shared Loki push API via the Traefik LB (verified: HTTP 204, no auth). usage top reads back with a LogQL metric query. This is the privacy-preserving resolution to "what does everyone (e.g. another user) use" — it never touches anyone's ~/.claude, which the org per-user policy bars (see the per-user red-line in managed-settings; reading another user's home is off-limits even for an owner in-session — a fresh session under changed MDM policy is the only legitimate path, and even then this telemetry is the better answer).
  • Best-effort, never affects the command. All errors swallowed; an 800ms client timeout bounds the cost; opt-out via HOMELAB_TELEMETRY=0. Telemetry must never slow or break the tool it measures.
  • Loki, not a new datastore. Zero new infra, and it dogfoods the v0.5 logs path (same host, same LB dial). Presence MySQL was the alternative (queryable SQL) but would add a write dependency and creds; Loki needs neither.