workstation: standardize on the native claude install (drop npm-global + npx)
All checks were successful
ci/woodpecker/push/default Pipeline was successful

Question from Viktor: should claude run via the binary or npx? Answer: the native
install is the recommended runtime (self-contained, self-updating ~/.local/bin/claude;
installMethod=native) — and every existing user had already auto-migrated to it, leaving
the npm-global copy empty and the npx fallback dead. "Leave only the recommended setup":

- setup-devvm.sh: node is now installed ONLY for the t3 CLI; dropped the machine-wide
  `npm install -g @anthropic-ai/claude-code` (npm/npx is not the recommended runtime and
  just shadowed the per-user native installs).
- t3-provision-users.sh: new per-user `install_user_claude_native` (runs the official
  https://claude.ai/install.sh AS the user, idempotent/skip-if-present) — provisions native
  claude for BOTH the terminal launcher and each t3-serve instance, replacing the npm bootstrap.
- skel/start-claude.sh: launcher runs the native `claude` only; if missing it bootstraps via
  the native installer (was an `npx @anthropic-ai/claude-code` fallback).
- docs/architecture/multi-tenancy.md: documented the native-only runtime model.

node stays (the pinned t3 CLI is npm-global). Verified: native installer reachable +
produces ~/.local/bin/claude 2.1.177; all three scripts pass bash -n + shellcheck.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-15 17:12:05 +00:00
parent 4a48f065e9
commit eecd78233b
4 changed files with 37 additions and 14 deletions

View file

@ -545,6 +545,8 @@ Separate from the in-cluster namespace-owner model above, the **devvm** (`10.0.1
**Onboarding state self-heals (2026-06-15):** `~/.claude.json` is a single file that ALL of a user's concurrent `claude` processes (the ttyd terminal + their `t3-serve` instance + agent/SDK sessions) read-modify-write, so a stale writer periodically drops top-level keys — including `hasCompletedOnboarding` — which bounces the next *interactive* session back to the first-run "Choose the text style" wizard even though the user is fully logged in (credentials live in the SEPARATE `~/.claude/.credentials.json`, untouched by the race; first observed for emo 2026-06-15). The launcher (`skel/start-claude.sh`) now idempotently re-asserts `hasCompletedOnboarding` (+ `lastOnboardingVersion`) in `~/.claude.json` right before it runs `claude` — merge-only, never clobbers other keys, no-op if jq is missing or the file is empty/corrupt. And since the launcher is a per-user copy that `/etc/skel` only seeds at account creation, the reconcile's new `deploy_user_launcher` step re-copies `skel/start-claude.sh` into every non-admin home (copy-if-changed) so launcher edits now reach EXISTING users within the hour — `.tmux.conf` is deliberately NOT re-copied (terminal-lobby appends its own managed section to it). **Onboarding state self-heals (2026-06-15):** `~/.claude.json` is a single file that ALL of a user's concurrent `claude` processes (the ttyd terminal + their `t3-serve` instance + agent/SDK sessions) read-modify-write, so a stale writer periodically drops top-level keys — including `hasCompletedOnboarding` — which bounces the next *interactive* session back to the first-run "Choose the text style" wizard even though the user is fully logged in (credentials live in the SEPARATE `~/.claude/.credentials.json`, untouched by the race; first observed for emo 2026-06-15). The launcher (`skel/start-claude.sh`) now idempotently re-asserts `hasCompletedOnboarding` (+ `lastOnboardingVersion`) in `~/.claude.json` right before it runs `claude` — merge-only, never clobbers other keys, no-op if jq is missing or the file is empty/corrupt. And since the launcher is a per-user copy that `/etc/skel` only seeds at account creation, the reconcile's new `deploy_user_launcher` step re-copies `skel/start-claude.sh` into every non-admin home (copy-if-changed) so launcher edits now reach EXISTING users within the hour — `.tmux.conf` is deliberately NOT re-copied (terminal-lobby appends its own managed section to it).
**Claude Code runtime — native, per-user (2026-06-15):** `claude` is the **native** install (`~/.local/bin/claude``~/.local/share/claude/versions/<v>`, self-updating; `installMethod: native`) — NOT npm-global or npx. It is the runtime for both the ttyd launcher and each `t3-serve` instance. `setup-devvm.sh` installs node ONLY for the `t3` CLI (not claude); per-user native claude is provisioned by the reconcile's `install_user_claude_native` (covers terminal + t3, idempotent, skip-if-present) and self-bootstrapped by `start-claude.sh` on first launch — both via the official `https://claude.ai/install.sh`. The legacy machine-wide `npm install -g @anthropic-ai/claude-code` bootstrap and the launcher's `npx` fallback were removed; existing users had already auto-migrated to native, and the npm-global dir was empty.
**Infra access:** non-admins get their own **writable, git-crypt-LOCKED** clone of the (public) infra repo — code/docs plaintext, secret files (`*.tfvars`, `secrets/**`) stay ciphertext. Its location depends on the per-user `code_layout` in `roster.yaml`: `single` (default) puts the clone AT `~/code`; `workspace` makes `~/code` a plain directory of per-project clones — the infra clone at `~/code/infra` plus each roster `repos` entry cloned from Forgejo `viktor/<name>` **as the user** (their PAT authenticates, so private repos work; clone failures WARN and retry next hour). Flipping a user to `workspace` auto-migrates their existing `~/code` clone to `~/code/infra` (local branches/dirty state survive; running processes follow the moved inode). ancamilea = workspace + `tripit` since 2026-06-10. The provisioner clones infra anonymously from the public GitHub mirror; **contribute access is wired per-user on top** (see below). The apply boundary still holds (`scripts/tg apply` needs an admin Vault token + cluster RBAC), but **pushing `master` is NOT inert** — the Forgejo→Woodpecker webhook fires `.woodpecker/default.yml` (`event: push, branch: master`, `require_approval: forks` only), which terragrunt-applies changed stacks. `master` is **branch-protected on Forgejo** (force-push disabled for everyone — history is append-only; push + merge whitelists = `viktor` + explicitly granted users, deploy keys allowed). **Allow-then-audit (Viktor, 2026-06-10):** `ebarzin` (emo) is on the whitelist and pushes straight to `master` — no PR gate. The tracking burden moves to: (a) **commit messages that record what + why** (the agent instructions in AGENTS.md and the managed claudeMd require the body to paraphrase the user's request), (b) the **`notify-nonadmin-push` Slack audit step** in `.woodpecker/default.yml` — every master push by a non-admin author is posted to Slack (admin pushes are not), and (c) non-admins **never use `[ci skip]`** so every change fires the pipeline (and thus the audit feed). Users NOT on the whitelist fall back to `<user>/<topic>` branches + PRs. **Clones stay fresh automatically** (2026-06-10): the hourly `t3-provision-users` reconcile runs `refresh_user_clone` over every managed clone — the infra clone and any workspace repos (fetch all remotes + fast-forward `master`, ONLY when on master with a clean tree and an upstream — dirty trees and local commits are left alone with a WARN) — and also `wire_forgejo_remote`, which idempotently adds the documented `forgejo` remote + `forgejo/master` upstream to infra clones that predate that contract. `start-claude.sh` does the same freshen at session launch (10s fetch cap per repo so an offline remote never stalls the session; workspace layouts freshen each repo under `~/code`). **Infra access:** non-admins get their own **writable, git-crypt-LOCKED** clone of the (public) infra repo — code/docs plaintext, secret files (`*.tfvars`, `secrets/**`) stay ciphertext. Its location depends on the per-user `code_layout` in `roster.yaml`: `single` (default) puts the clone AT `~/code`; `workspace` makes `~/code` a plain directory of per-project clones — the infra clone at `~/code/infra` plus each roster `repos` entry cloned from Forgejo `viktor/<name>` **as the user** (their PAT authenticates, so private repos work; clone failures WARN and retry next hour). Flipping a user to `workspace` auto-migrates their existing `~/code` clone to `~/code/infra` (local branches/dirty state survive; running processes follow the moved inode). ancamilea = workspace + `tripit` since 2026-06-10. The provisioner clones infra anonymously from the public GitHub mirror; **contribute access is wired per-user on top** (see below). The apply boundary still holds (`scripts/tg apply` needs an admin Vault token + cluster RBAC), but **pushing `master` is NOT inert** — the Forgejo→Woodpecker webhook fires `.woodpecker/default.yml` (`event: push, branch: master`, `require_approval: forks` only), which terragrunt-applies changed stacks. `master` is **branch-protected on Forgejo** (force-push disabled for everyone — history is append-only; push + merge whitelists = `viktor` + explicitly granted users, deploy keys allowed). **Allow-then-audit (Viktor, 2026-06-10):** `ebarzin` (emo) is on the whitelist and pushes straight to `master` — no PR gate. The tracking burden moves to: (a) **commit messages that record what + why** (the agent instructions in AGENTS.md and the managed claudeMd require the body to paraphrase the user's request), (b) the **`notify-nonadmin-push` Slack audit step** in `.woodpecker/default.yml` — every master push by a non-admin author is posted to Slack (admin pushes are not), and (c) non-admins **never use `[ci skip]`** so every change fires the pipeline (and thus the audit feed). Users NOT on the whitelist fall back to `<user>/<topic>` branches + PRs. **Clones stay fresh automatically** (2026-06-10): the hourly `t3-provision-users` reconcile runs `refresh_user_clone` over every managed clone — the infra clone and any workspace repos (fetch all remotes + fast-forward `master`, ONLY when on master with a clean tree and an upstream — dirty trees and local commits are left alone with a WARN) — and also `wire_forgejo_remote`, which idempotently adds the documented `forgejo` remote + `forgejo/master` upstream to infra clones that predate that contract. `start-claude.sh` does the same freshen at session launch (10s fetch cap per repo so an offline remote never stalls the session; workspace layouts freshen each repo under `~/code`).
**Contribute access (per non-admin, manual — the anca/tripit PAT precedent):** **Contribute access (per non-admin, manual — the anca/tripit PAT precedent):**

View file

@ -288,6 +288,25 @@ deploy_user_launcher() {
log "deployed start-claude.sh -> $user" log "deployed start-claude.sh -> $user"
} }
# Ensure the per-user NATIVE claude install (the recommended runtime: ~user/.local/bin/claude,
# self-updating) — used by BOTH the terminal launcher AND the user's t3-serve instance. We do
# NOT npm-install claude system-wide (npm/npx isn't the recommended runtime); each user gets
# their own native install. Idempotent: skip if already present. Runs the official native
# installer AS the user (into their ~/.local). Best-effort: a failure WARNs and retries next
# reconcile (start-claude.sh also self-bootstraps the terminal path).
install_user_claude_native() {
local user="$1" home
home="$(getent passwd "$user" | cut -d: -f6)"
[[ -n "$home" && -d "$home" ]] || return 0
[[ -x "$home/.local/bin/claude" ]] && return 0 # already native -> done
if [[ "$DRY_RUN" == 1 ]]; then echo "[dry-run] native claude install -> $user"; return 0; fi
if runuser -u "$user" -- bash -lc 'curl -fsSL https://claude.ai/install.sh | bash' >/dev/null 2>&1; then
log "installed native claude -> $user"
else
log "WARN: native claude install failed for $user (retries next reconcile)"
fi
}
[[ $EUID -eq 0 ]] || { echo "t3-provision-users: must run as root" >&2; exit 1; } [[ $EUID -eq 0 ]] || { echo "t3-provision-users: must run as root" >&2; exit 1; }
for bin in python3 jq; do command -v "$bin" >/dev/null || { echo "missing $bin" >&2; exit 1; }; done for bin in python3 jq; do command -v "$bin" >/dev/null || { echo "missing $bin" >&2; exit 1; }; done
[[ -f "$ROSTER" && -f "$ENGINE" ]] || { echo "roster/engine not under $WORKSTATION_DIR" >&2; exit 1; } [[ -f "$ROSTER" && -f "$ENGINE" ]] || { echo "roster/engine not under $WORKSTATION_DIR" >&2; exit 1; }
@ -367,6 +386,7 @@ while IFS=$'\t' read -r os_user tier shell groups_csv code_layout repos_csv; do
deploy_user_launcher "$os_user" # keep ~/start-claude.sh current (skel only seeds new accounts) deploy_user_launcher "$os_user" # keep ~/start-claude.sh current (skel only seeds new accounts)
fi fi
refresh_codex_mirror "$os_user" # all tiers — mirror of the managed claudeMd refresh_codex_mirror "$os_user" # all tiers — mirror of the managed claudeMd
install_user_claude_native "$os_user" # all tiers — per-user native claude (terminal + t3); no npm/npx
done < <(jq -r '.accounts[] | [.os_user, .tier, .shell, (if (.groups|length)==0 then "-" else (.groups|join(",")) end), .code_layout, (if (.repos|length)==0 then "-" else (.repos|join(",")) end)] | @tsv' "$desired_file") done < <(jq -r '.accounts[] | [.os_user, .tier, .shell, (if (.groups|length)==0 then "-" else (.groups|join(",")) end), .code_layout, (if (.repos|length)==0 then "-" else (.repos|join(",")) end)] | @tsv' "$desired_file")
# 5) per-user .env (sticky port) + enable t3-serve@ # 5) per-user .env (sticky port) + enable t3-serve@

View file

@ -21,7 +21,13 @@ export DEBIAN_FRONTEND=noninteractive
apt-get update -qq apt-get update -qq
apt-get install -y "${PKGS[@]}" >/dev/null apt-get install -y "${PKGS[@]}" >/dev/null
# 2) node >= 18 + claude-code (claude-code requires node >= 18) # 2) node >= 18 — needed for the t3 CLI (npm-global, below). NOT for claude-code:
# claude-code is the per-user NATIVE install (the recommended, self-updating
# ~/.local/bin/claude), provisioned per user by t3-provision-users
# (install_user_claude_native) and self-bootstrapped by start-claude.sh on first launch.
# We deliberately do NOT `npm install -g @anthropic-ai/claude-code` — npm/npx is not the
# recommended runtime, and a system-wide npm copy just shadows/duplicates the per-user
# native installs everyone auto-migrates to anyway.
need_node=1 need_node=1
if command -v node >/dev/null; then if command -v node >/dev/null; then
[[ "$(node -v | sed 's/^v\([0-9]*\).*/\1/')" -ge 18 ]] && need_node=0 [[ "$(node -v | sed 's/^v\([0-9]*\).*/\1/')" -ge 18 ]] && need_node=0
@ -31,14 +37,6 @@ if [[ $need_node -eq 1 ]]; then
curl -fsSL https://deb.nodesource.com/setup_22.x | bash - >/dev/null curl -fsSL https://deb.nodesource.com/setup_22.x | bash - >/dev/null
apt-get install -y nodejs >/dev/null apt-get install -y nodejs >/dev/null
fi fi
# Detect the GLOBAL npm package, NOT whatever `claude` resolves to on PATH: the admin's
# personal ~/.local/bin/claude shadows it, so `command -v claude` silently skipped the
# system-wide install — leaving /usr/lib/node_modules/@anthropic-ai empty and fresh
# non-admins with no claude (they only worked because the admin's install was on PATH).
if ! npm ls -g --depth=0 @anthropic-ai/claude-code >/dev/null 2>&1; then
log "npm: installing @anthropic-ai/claude-code (system-wide)"
npm install -g @anthropic-ai/claude-code >/dev/null
fi
# 2b) t3 (the per-user coding surface) — PINNED, never nightly/latest. t3 is pre-1.0 and # 2b) t3 (the per-user coding surface) — PINNED, never nightly/latest. t3 is pre-1.0 and
# ships breaking auth-schema + bootstrap-API changes our t3-dispatch can't follow blind # ships breaking auth-schema + bootstrap-API changes our t3-dispatch can't follow blind

View file

@ -42,13 +42,16 @@ else
done done
fi fi
# Prefer the system-wide `claude` (installed by setup-devvm.sh); fall back to npx. # Run the NATIVE `claude` (the recommended install: ~/.local/bin/claude, self-updating).
# No npm/npx. If the native binary is missing (a fresh account before the hourly reconcile
# has provisioned it), bootstrap it with the official native installer, then run it.
launch() { launch() {
if command -v claude >/dev/null 2>&1; then if ! command -v claude >/dev/null 2>&1; then
claude "$@" echo " Installing Claude Code (native) for $(id -un)"
else curl -fsSL https://claude.ai/install.sh | bash || return 127
npx @anthropic-ai/claude-code "$@" export PATH="$HOME/.local/bin:$PATH"
fi fi
claude "$@"
} }
# Re-assert Claude Code's first-run onboarding flag before launch. ~/.claude.json is a # Re-assert Claude Code's first-run onboarding flag before launch. ~/.claude.json is a