From ef555c7e02fb2fb9fcedc441c7e5ec48619159cb Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Mon, 15 Jun 2026 17:20:03 +0000 Subject: [PATCH] workstation: put ~/.local/bin on PATH so the launcher finds native claude MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Viktor hit "~/.local/bin is not part of the PATH". Root cause: the native claude binary lives in ~/.local/bin, but the terminal launcher (start-claude.sh) runs in tmux's NON-login bash env, which doesn't source the user's shell rc where the native installer put ~/.local/bin on PATH. So `command -v claude` failed there → the launcher's bootstrap re-ran the native installer → the installer printed the PATH warning. (Interactive zsh already had ~/.local/bin via the per-user installer rc edit, and t3-serve sets PATH in its unit — so only the terminal launcher was affected.) - skel/start-claude.sh: prepend ~/.local/bin to PATH near the top (guarded/idempotent), before the launch logic — so `claude` is found, no reinstall, no warning. - setup-devvm.sh: install /etc/profile.d/10-local-bin.sh — adds ~/.local/bin to PATH for all LOGIN shells machine-wide (SSH etc.), independent of the per-user installer rc edit (fresh-user-safe). zsh login picks it up via /etc/zsh/zprofile -> /etc/profile. - docs/architecture/multi-tenancy.md: documented the three PATH-injection points. Verified: guard adds-when-missing / no-dup-when-present; all scripts pass bash -n. Co-Authored-By: Claude Opus 4.8 --- docs/architecture/multi-tenancy.md | 2 +- scripts/workstation/setup-devvm.sh | 17 +++++++++++++++++ scripts/workstation/skel/start-claude.sh | 8 ++++++++ 3 files changed, 26 insertions(+), 1 deletion(-) diff --git a/docs/architecture/multi-tenancy.md b/docs/architecture/multi-tenancy.md index 17163820..7764ebb1 100644 --- a/docs/architecture/multi-tenancy.md +++ b/docs/architecture/multi-tenancy.md @@ -545,7 +545,7 @@ Separate from the in-cluster namespace-owner model above, the **devvm** (`10.0.1 **Onboarding state self-heals (2026-06-15):** `~/.claude.json` is a single file that ALL of a user's concurrent `claude` processes (the ttyd terminal + their `t3-serve` instance + agent/SDK sessions) read-modify-write, so a stale writer periodically drops top-level keys — including `hasCompletedOnboarding` — which bounces the next *interactive* session back to the first-run "Choose the text style" wizard even though the user is fully logged in (credentials live in the SEPARATE `~/.claude/.credentials.json`, untouched by the race; first observed for emo 2026-06-15). The launcher (`skel/start-claude.sh`) now idempotently re-asserts `hasCompletedOnboarding` (+ `lastOnboardingVersion`) in `~/.claude.json` right before it runs `claude` — merge-only, never clobbers other keys, no-op if jq is missing or the file is empty/corrupt. And since the launcher is a per-user copy that `/etc/skel` only seeds at account creation, the reconcile's new `deploy_user_launcher` step re-copies `skel/start-claude.sh` into every non-admin home (copy-if-changed) so launcher edits now reach EXISTING users within the hour — `.tmux.conf` is deliberately NOT re-copied (terminal-lobby appends its own managed section to it). -**Claude Code runtime — native, per-user (2026-06-15):** `claude` is the **native** install (`~/.local/bin/claude` → `~/.local/share/claude/versions/`, self-updating; `installMethod: native`) — NOT npm-global or npx. It is the runtime for both the ttyd launcher and each `t3-serve` instance. `setup-devvm.sh` installs node ONLY for the `t3` CLI (not claude); per-user native claude is provisioned by the reconcile's `install_user_claude_native` (covers terminal + t3, idempotent, skip-if-present) and self-bootstrapped by `start-claude.sh` on first launch — both via the official `https://claude.ai/install.sh`. The legacy machine-wide `npm install -g @anthropic-ai/claude-code` bootstrap and the launcher's `npx` fallback were removed; existing users had already auto-migrated to native, and the npm-global dir was empty. +**Claude Code runtime — native, per-user (2026-06-15):** `claude` is the **native** install (`~/.local/bin/claude` → `~/.local/share/claude/versions/`, self-updating; `installMethod: native`) — NOT npm-global or npx. It is the runtime for both the ttyd launcher and each `t3-serve` instance. `setup-devvm.sh` installs node ONLY for the `t3` CLI (not claude); per-user native claude is provisioned by the reconcile's `install_user_claude_native` (covers terminal + t3, idempotent, skip-if-present) and self-bootstrapped by `start-claude.sh` on first launch — both via the official `https://claude.ai/install.sh`. The legacy machine-wide `npm install -g @anthropic-ai/claude-code` bootstrap and the launcher's `npx` fallback were removed; existing users had already auto-migrated to native, and the npm-global dir was empty. **PATH (`~/.local/bin`, where the native binary lives):** ensured three ways — `/etc/profile.d/10-local-bin.sh` for login shells (machine-wide, fresh-user-safe), `start-claude.sh` itself (the launcher runs in tmux's non-login env that skips the user's shell rc), and `t3-serve@.service` (`Environment=PATH=…:/home/%i/.local/bin`). **Infra access:** non-admins get their own **writable, git-crypt-LOCKED** clone of the (public) infra repo — code/docs plaintext, secret files (`*.tfvars`, `secrets/**`) stay ciphertext. Its location depends on the per-user `code_layout` in `roster.yaml`: `single` (default) puts the clone AT `~/code`; `workspace` makes `~/code` a plain directory of per-project clones — the infra clone at `~/code/infra` plus each roster `repos` entry cloned from Forgejo `viktor/` **as the user** (their PAT authenticates, so private repos work; clone failures WARN and retry next hour). Flipping a user to `workspace` auto-migrates their existing `~/code` clone to `~/code/infra` (local branches/dirty state survive; running processes follow the moved inode). ancamilea = workspace + `tripit` since 2026-06-10. The provisioner clones infra anonymously from the public GitHub mirror; **contribute access is wired per-user on top** (see below). The apply boundary still holds (`scripts/tg apply` needs an admin Vault token + cluster RBAC), but **pushing `master` is NOT inert** — the Forgejo→Woodpecker webhook fires `.woodpecker/default.yml` (`event: push, branch: master`, `require_approval: forks` only), which terragrunt-applies changed stacks. `master` is **branch-protected on Forgejo** (force-push disabled for everyone — history is append-only; push + merge whitelists = `viktor` + explicitly granted users, deploy keys allowed). **Allow-then-audit (Viktor, 2026-06-10):** `ebarzin` (emo) is on the whitelist and pushes straight to `master` — no PR gate. The tracking burden moves to: (a) **commit messages that record what + why** (the agent instructions in AGENTS.md and the managed claudeMd require the body to paraphrase the user's request), (b) the **`notify-nonadmin-push` Slack audit step** in `.woodpecker/default.yml` — every master push by a non-admin author is posted to Slack (admin pushes are not), and (c) non-admins **never use `[ci skip]`** so every change fires the pipeline (and thus the audit feed). Users NOT on the whitelist fall back to `/` branches + PRs. **Clones stay fresh automatically** (2026-06-10): the hourly `t3-provision-users` reconcile runs `refresh_user_clone` over every managed clone — the infra clone and any workspace repos (fetch all remotes + fast-forward `master`, ONLY when on master with a clean tree and an upstream — dirty trees and local commits are left alone with a WARN) — and also `wire_forgejo_remote`, which idempotently adds the documented `forgejo` remote + `forgejo/master` upstream to infra clones that predate that contract. `start-claude.sh` does the same freshen at session launch (10s fetch cap per repo so an offline remote never stalls the session; workspace layouts freshen each repo under `~/code`). diff --git a/scripts/workstation/setup-devvm.sh b/scripts/workstation/setup-devvm.sh index be6e0e12..b0275bbf 100755 --- a/scripts/workstation/setup-devvm.sh +++ b/scripts/workstation/setup-devvm.sh @@ -38,6 +38,23 @@ if [[ $need_node -eq 1 ]]; then apt-get install -y nodejs >/dev/null fi +# 2a) ~/.local/bin on PATH for all LOGIN shells (machine-wide). The native claude install +# lives at ~/.local/bin; this guarantees login shells (SSH, etc.) find it regardless of +# whether the per-user native-installer rc edit ran. (The terminal launcher sets PATH +# itself, and t3-serve@.service hard-sets PATH in the unit.) +install -d -m 0755 /etc/profile.d +cat > /etc/profile.d/10-local-bin.sh <<'PROFILE_EOF' +# Native per-user installs (e.g. claude-code) live in ~/.local/bin — put it on PATH. +# Guarded so it never duplicates. Sourced by login shells (bash via /etc/profile; zsh +# login via /etc/zsh/zprofile -> /etc/profile). +case ":$PATH:" in + *":$HOME/.local/bin:"*) ;; + *) export PATH="$HOME/.local/bin:$PATH" ;; +esac +PROFILE_EOF +chmod 0644 /etc/profile.d/10-local-bin.sh +log "/etc/profile.d/10-local-bin.sh (~/.local/bin on PATH for login shells)" + # 2b) t3 (the per-user coding surface) — PINNED, never nightly/latest. t3 is pre-1.0 and # ships breaking auth-schema + bootstrap-API changes our t3-dispatch can't follow blind # (2026-06-09 outage: a nightly auto-update broke pairing for ALL users). The daily diff --git a/scripts/workstation/skel/start-claude.sh b/scripts/workstation/skel/start-claude.sh index 2353eace..b3e25744 100755 --- a/scripts/workstation/skel/start-claude.sh +++ b/scripts/workstation/skel/start-claude.sh @@ -11,6 +11,14 @@ echo " Starting Claude Code in $HOME/code ..." echo " (Right-click for tmux menu, or Ctrl+B then | or - to split)" echo "" +# The native claude install lives in ~/.local/bin. This launcher runs in tmux's non-login +# env, which does NOT source the user's shell rc (where the native installer added it to +# PATH) — so `claude` would appear missing here. Put it on PATH ourselves; guarded/idempotent. +case ":$PATH:" in + *":$HOME/.local/bin:"*) ;; + *) export PATH="$HOME/.local/bin:$PATH" ;; +esac + name_args=() if [ -n "${TMUX:-}" ]; then sess="$(tmux display-message -p '#{session_name}' 2>/dev/null)"