diff --git a/docs/architecture/multi-tenancy.md b/docs/architecture/multi-tenancy.md index baaf8007..17163820 100644 --- a/docs/architecture/multi-tenancy.md +++ b/docs/architecture/multi-tenancy.md @@ -545,6 +545,8 @@ Separate from the in-cluster namespace-owner model above, the **devvm** (`10.0.1 **Onboarding state self-heals (2026-06-15):** `~/.claude.json` is a single file that ALL of a user's concurrent `claude` processes (the ttyd terminal + their `t3-serve` instance + agent/SDK sessions) read-modify-write, so a stale writer periodically drops top-level keys — including `hasCompletedOnboarding` — which bounces the next *interactive* session back to the first-run "Choose the text style" wizard even though the user is fully logged in (credentials live in the SEPARATE `~/.claude/.credentials.json`, untouched by the race; first observed for emo 2026-06-15). The launcher (`skel/start-claude.sh`) now idempotently re-asserts `hasCompletedOnboarding` (+ `lastOnboardingVersion`) in `~/.claude.json` right before it runs `claude` — merge-only, never clobbers other keys, no-op if jq is missing or the file is empty/corrupt. And since the launcher is a per-user copy that `/etc/skel` only seeds at account creation, the reconcile's new `deploy_user_launcher` step re-copies `skel/start-claude.sh` into every non-admin home (copy-if-changed) so launcher edits now reach EXISTING users within the hour — `.tmux.conf` is deliberately NOT re-copied (terminal-lobby appends its own managed section to it). +**Claude Code runtime — native, per-user (2026-06-15):** `claude` is the **native** install (`~/.local/bin/claude` → `~/.local/share/claude/versions/`, self-updating; `installMethod: native`) — NOT npm-global or npx. It is the runtime for both the ttyd launcher and each `t3-serve` instance. `setup-devvm.sh` installs node ONLY for the `t3` CLI (not claude); per-user native claude is provisioned by the reconcile's `install_user_claude_native` (covers terminal + t3, idempotent, skip-if-present) and self-bootstrapped by `start-claude.sh` on first launch — both via the official `https://claude.ai/install.sh`. The legacy machine-wide `npm install -g @anthropic-ai/claude-code` bootstrap and the launcher's `npx` fallback were removed; existing users had already auto-migrated to native, and the npm-global dir was empty. + **Infra access:** non-admins get their own **writable, git-crypt-LOCKED** clone of the (public) infra repo — code/docs plaintext, secret files (`*.tfvars`, `secrets/**`) stay ciphertext. Its location depends on the per-user `code_layout` in `roster.yaml`: `single` (default) puts the clone AT `~/code`; `workspace` makes `~/code` a plain directory of per-project clones — the infra clone at `~/code/infra` plus each roster `repos` entry cloned from Forgejo `viktor/` **as the user** (their PAT authenticates, so private repos work; clone failures WARN and retry next hour). Flipping a user to `workspace` auto-migrates their existing `~/code` clone to `~/code/infra` (local branches/dirty state survive; running processes follow the moved inode). ancamilea = workspace + `tripit` since 2026-06-10. The provisioner clones infra anonymously from the public GitHub mirror; **contribute access is wired per-user on top** (see below). The apply boundary still holds (`scripts/tg apply` needs an admin Vault token + cluster RBAC), but **pushing `master` is NOT inert** — the Forgejo→Woodpecker webhook fires `.woodpecker/default.yml` (`event: push, branch: master`, `require_approval: forks` only), which terragrunt-applies changed stacks. `master` is **branch-protected on Forgejo** (force-push disabled for everyone — history is append-only; push + merge whitelists = `viktor` + explicitly granted users, deploy keys allowed). **Allow-then-audit (Viktor, 2026-06-10):** `ebarzin` (emo) is on the whitelist and pushes straight to `master` — no PR gate. The tracking burden moves to: (a) **commit messages that record what + why** (the agent instructions in AGENTS.md and the managed claudeMd require the body to paraphrase the user's request), (b) the **`notify-nonadmin-push` Slack audit step** in `.woodpecker/default.yml` — every master push by a non-admin author is posted to Slack (admin pushes are not), and (c) non-admins **never use `[ci skip]`** so every change fires the pipeline (and thus the audit feed). Users NOT on the whitelist fall back to `/` branches + PRs. **Clones stay fresh automatically** (2026-06-10): the hourly `t3-provision-users` reconcile runs `refresh_user_clone` over every managed clone — the infra clone and any workspace repos (fetch all remotes + fast-forward `master`, ONLY when on master with a clean tree and an upstream — dirty trees and local commits are left alone with a WARN) — and also `wire_forgejo_remote`, which idempotently adds the documented `forgejo` remote + `forgejo/master` upstream to infra clones that predate that contract. `start-claude.sh` does the same freshen at session launch (10s fetch cap per repo so an offline remote never stalls the session; workspace layouts freshen each repo under `~/code`). **Contribute access (per non-admin, manual — the anca/tripit PAT precedent):** diff --git a/scripts/t3-provision-users.sh b/scripts/t3-provision-users.sh index 593de0f9..c5bbe4a9 100644 --- a/scripts/t3-provision-users.sh +++ b/scripts/t3-provision-users.sh @@ -288,6 +288,25 @@ deploy_user_launcher() { log "deployed start-claude.sh -> $user" } +# Ensure the per-user NATIVE claude install (the recommended runtime: ~user/.local/bin/claude, +# self-updating) — used by BOTH the terminal launcher AND the user's t3-serve instance. We do +# NOT npm-install claude system-wide (npm/npx isn't the recommended runtime); each user gets +# their own native install. Idempotent: skip if already present. Runs the official native +# installer AS the user (into their ~/.local). Best-effort: a failure WARNs and retries next +# reconcile (start-claude.sh also self-bootstraps the terminal path). +install_user_claude_native() { + local user="$1" home + home="$(getent passwd "$user" | cut -d: -f6)" + [[ -n "$home" && -d "$home" ]] || return 0 + [[ -x "$home/.local/bin/claude" ]] && return 0 # already native -> done + if [[ "$DRY_RUN" == 1 ]]; then echo "[dry-run] native claude install -> $user"; return 0; fi + if runuser -u "$user" -- bash -lc 'curl -fsSL https://claude.ai/install.sh | bash' >/dev/null 2>&1; then + log "installed native claude -> $user" + else + log "WARN: native claude install failed for $user (retries next reconcile)" + fi +} + [[ $EUID -eq 0 ]] || { echo "t3-provision-users: must run as root" >&2; exit 1; } for bin in python3 jq; do command -v "$bin" >/dev/null || { echo "missing $bin" >&2; exit 1; }; done [[ -f "$ROSTER" && -f "$ENGINE" ]] || { echo "roster/engine not under $WORKSTATION_DIR" >&2; exit 1; } @@ -367,6 +386,7 @@ while IFS=$'\t' read -r os_user tier shell groups_csv code_layout repos_csv; do deploy_user_launcher "$os_user" # keep ~/start-claude.sh current (skel only seeds new accounts) fi refresh_codex_mirror "$os_user" # all tiers — mirror of the managed claudeMd + install_user_claude_native "$os_user" # all tiers — per-user native claude (terminal + t3); no npm/npx done < <(jq -r '.accounts[] | [.os_user, .tier, .shell, (if (.groups|length)==0 then "-" else (.groups|join(",")) end), .code_layout, (if (.repos|length)==0 then "-" else (.repos|join(",")) end)] | @tsv' "$desired_file") # 5) per-user .env (sticky port) + enable t3-serve@ diff --git a/scripts/workstation/setup-devvm.sh b/scripts/workstation/setup-devvm.sh index 4bf6908b..be6e0e12 100755 --- a/scripts/workstation/setup-devvm.sh +++ b/scripts/workstation/setup-devvm.sh @@ -21,7 +21,13 @@ export DEBIAN_FRONTEND=noninteractive apt-get update -qq apt-get install -y "${PKGS[@]}" >/dev/null -# 2) node >= 18 + claude-code (claude-code requires node >= 18) +# 2) node >= 18 — needed for the t3 CLI (npm-global, below). NOT for claude-code: +# claude-code is the per-user NATIVE install (the recommended, self-updating +# ~/.local/bin/claude), provisioned per user by t3-provision-users +# (install_user_claude_native) and self-bootstrapped by start-claude.sh on first launch. +# We deliberately do NOT `npm install -g @anthropic-ai/claude-code` — npm/npx is not the +# recommended runtime, and a system-wide npm copy just shadows/duplicates the per-user +# native installs everyone auto-migrates to anyway. need_node=1 if command -v node >/dev/null; then [[ "$(node -v | sed 's/^v\([0-9]*\).*/\1/')" -ge 18 ]] && need_node=0 @@ -31,14 +37,6 @@ if [[ $need_node -eq 1 ]]; then curl -fsSL https://deb.nodesource.com/setup_22.x | bash - >/dev/null apt-get install -y nodejs >/dev/null fi -# Detect the GLOBAL npm package, NOT whatever `claude` resolves to on PATH: the admin's -# personal ~/.local/bin/claude shadows it, so `command -v claude` silently skipped the -# system-wide install — leaving /usr/lib/node_modules/@anthropic-ai empty and fresh -# non-admins with no claude (they only worked because the admin's install was on PATH). -if ! npm ls -g --depth=0 @anthropic-ai/claude-code >/dev/null 2>&1; then - log "npm: installing @anthropic-ai/claude-code (system-wide)" - npm install -g @anthropic-ai/claude-code >/dev/null -fi # 2b) t3 (the per-user coding surface) — PINNED, never nightly/latest. t3 is pre-1.0 and # ships breaking auth-schema + bootstrap-API changes our t3-dispatch can't follow blind diff --git a/scripts/workstation/skel/start-claude.sh b/scripts/workstation/skel/start-claude.sh index dcd716fb..2353eace 100755 --- a/scripts/workstation/skel/start-claude.sh +++ b/scripts/workstation/skel/start-claude.sh @@ -42,13 +42,16 @@ else done fi -# Prefer the system-wide `claude` (installed by setup-devvm.sh); fall back to npx. +# Run the NATIVE `claude` (the recommended install: ~/.local/bin/claude, self-updating). +# No npm/npx. If the native binary is missing (a fresh account before the hourly reconcile +# has provisioned it), bootstrap it with the official native installer, then run it. launch() { - if command -v claude >/dev/null 2>&1; then - claude "$@" - else - npx @anthropic-ai/claude-code "$@" + if ! command -v claude >/dev/null 2>&1; then + echo " Installing Claude Code (native) for $(id -un) …" + curl -fsSL https://claude.ai/install.sh | bash || return 127 + export PATH="$HOME/.local/bin:$PATH" fi + claude "$@" } # Re-assert Claude Code's first-run onboarding flag before launch. ~/.claude.json is a