infra

Author	SHA1	Message	Date
Viktor Barzin	ef555c7e02	workstation: put ~/.local/bin on PATH so the launcher finds native claude All checks were successful ci/woodpecker/push/default Pipeline was successful Details Viktor hit "~/.local/bin is not part of the PATH". Root cause: the native claude binary lives in ~/.local/bin, but the terminal launcher (start-claude.sh) runs in tmux's NON-login bash env, which doesn't source the user's shell rc where the native installer put ~/.local/bin on PATH. So `command -v claude` failed there → the launcher's bootstrap re-ran the native installer → the installer printed the PATH warning. (Interactive zsh already had ~/.local/bin via the per-user installer rc edit, and t3-serve sets PATH in its unit — so only the terminal launcher was affected.) - skel/start-claude.sh: prepend ~/.local/bin to PATH near the top (guarded/idempotent), before the launch logic — so `claude` is found, no reinstall, no warning. - setup-devvm.sh: install /etc/profile.d/10-local-bin.sh — adds ~/.local/bin to PATH for all LOGIN shells machine-wide (SSH etc.), independent of the per-user installer rc edit (fresh-user-safe). zsh login picks it up via /etc/zsh/zprofile -> /etc/profile. - docs/architecture/multi-tenancy.md: documented the three PATH-injection points. Verified: guard adds-when-missing / no-dup-when-present; all scripts pass bash -n. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 17:20:03 +00:00
Viktor Barzin	eecd78233b	workstation: standardize on the native claude install (drop npm-global + npx) All checks were successful ci/woodpecker/push/default Pipeline was successful Details Question from Viktor: should claude run via the binary or npx? Answer: the native install is the recommended runtime (self-contained, self-updating ~/.local/bin/claude; installMethod=native) — and every existing user had already auto-migrated to it, leaving the npm-global copy empty and the npx fallback dead. "Leave only the recommended setup": - setup-devvm.sh: node is now installed ONLY for the t3 CLI; dropped the machine-wide `npm install -g @anthropic-ai/claude-code` (npm/npx is not the recommended runtime and just shadowed the per-user native installs). - t3-provision-users.sh: new per-user `install_user_claude_native` (runs the official https://claude.ai/install.sh AS the user, idempotent/skip-if-present) — provisions native claude for BOTH the terminal launcher and each t3-serve instance, replacing the npm bootstrap. - skel/start-claude.sh: launcher runs the native `claude` only; if missing it bootstraps via the native installer (was an `npx @anthropic-ai/claude-code` fallback). - docs/architecture/multi-tenancy.md: documented the native-only runtime model. node stays (the pinned t3 CLI is npm-global). Verified: native installer reachable + produces ~/.local/bin/claude 2.1.177; all three scripts pass bash -n + shellcheck. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 17:12:05 +00:00
Viktor Barzin	bb3f5f2329	workstation: stop the Claude Code onboarding wizard reappearing for terminal users All checks were successful ci/woodpecker/push/default Pipeline was successful Details emo reported being "logged out" on terminal.viktorbarzin.me: every new shell dropped him at the first-run "Choose the text style" wizard, even though he'd used many sessions and is in fact fully authenticated. Root cause is NOT a logout — ~/.claude.json is a single file that all of a user's concurrent claude processes (the ttyd terminal + their t3-serve instance + agent sessions) read-modify-write, and a stale writer periodically drops top-level keys, including hasCompletedOnboarding. That bounces the next interactive session back to onboarding; credentials are safe in the separate ~/.claude/.credentials.json (which is why T3 kept working). wizard's own ~/.claude.json showed the same key loss, so this hits any heavy multi-session user. Fix: - skel/start-claude.sh: ensure_onboarding() idempotently re-asserts hasCompletedOnboarding (+ lastOnboardingVersion) in ~/.claude.json right before launching claude. Merge-only (never clobbers other keys), runs as the user, and no-ops if jq is missing or the file is empty/corrupt. So even if the race drops the flag, the next launch restores it before claude reads it. - t3-provision-users.sh: deploy_user_launcher() re-copies skel/start-claude.sh into every non-admin home (copy-if-changed) on the hourly reconcile. /etc/skel only seeds the launcher at account creation, so without this the fix (and any future launcher edit) would never reach existing users. .tmux.conf is deliberately not re-copied — terminal-lobby appends a managed section to it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 14:37:59 +00:00
Viktor Barzin	5f7c2964ac	workstation: session-launch freshen follows the checked-out branch (not just master) Viktor asked to log Anca into her GitHub account so she can develop on the devvm and deploy her apps through the existing CI/CD. Her GitHub repos (Plotting-Your-Dream-Book, travel, My-Wardrobe — now cloned into her ~/code workspace) default to main, and the launcher freshen only fast-forwarded master, silently skipping them. ff the current branch's upstream instead — same safety gates (on a branch, clean tree, upstream configured, ff-only). Single-layout infra clones behave identically. [ci skip]	2026-06-10 18:20:59 +00:00
Viktor Barzin	2825cb1703	workstation: per-user code_layout — workspace puts project repos under ~/code (ancamilea + tripit) Viktor asked to restructure Anca's setup: her ~/code WAS the infra clone itself; he wants ~/code to be the directory where all her project repos (tripit etc.) live side by side, with infra moved to a subdirectory. - roster.yaml gains per-user 'code_layout: single\|workspace' + 'repos', validated + derived by roster_engine.py (12 new tests, 40 total). - t3-provision-users reconcile: auto-migrates a single-layout ~/code to ~/code/infra (running processes follow the moved inode), hoists nested project clones to the workspace root, clones roster repos from Forgejo AS the user (their PAT makes private repos work), and wires the documented forgejo remote + forgejo/master upstream into clones that predate that contract. - Fixed a latent TSV bug: empty jq @tsv fields collapse under tab-IFS read, shifting later fields left (groups was only safe by being the last field) — emit '-' sentinels instead. - start-claude.sh session freshen is layout-aware (freshens each repo under ~/code for workspace users). - managed claudeMd + AGENTS.md non-admin recipe + multi-tenancy.md updated in the same change. Applied live: ancamilea = workspace (infra at ~/code/infra, her existing tripit clone hoisted to ~/code/tripit, master upstream switched to forgejo/master); emo stays single layout, untouched. [ci skip]	2026-06-10 18:05:31 +00:00
Viktor Barzin	2e5af5dc0e	workstation: keep non-admin infra clones fresh (hourly + at launch) [ci skip] Non-admins (emo) need current master without manual pulls. Two layers: - t3-provision-users reconcile gains refresh_locked_clone: fetch all remotes + ff-only master, guarded (on master, clean tree, upstream set); dirty/diverged clones are left alone with a WARN. - start-claude.sh freshens ~/code at session launch, 15s-capped so an offline remote never delays the session. Verified live on emo's clone: stale clone ff'd to tip by the reconciler; launcher snippet ff's when clean and refuses while a dirty file exists. Deployed to /usr/local/bin/t3-provision-users, /etc/skel/start-claude.sh, and emo's launcher. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 09:41:38 +00:00
Viktor Barzin	68a237faf7	workstation: skel start-claude.sh inherits managed default model (drop hardcoded --model) The per-user launcher hardcoded --model claude-opus-4-8; an explicit --model flag overrides the managed default in /etc/claude-code/managed-settings.json (claude-fable-5). Dropping it lets emo and all new accounts inherit the org default (per-session /model still works). Deployed to /etc/skel and emo live copy in the same change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 19:35:29 +00:00
Viktor Barzin	fd0f4a0365	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip] `6d224861` came from a --no-checkout worktree whose empty index made the commit drop every file except two. This restores 05b50d2b's full tree and correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the live infra was never applied from the broken commit. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 08:45:33 +00:00
Viktor Barzin	6d224861c4	stem95su: scheduled Drive->site sync CronJob (every 10m) CronJob stem95su-gdrive-sync (*/10) mounts the content PVC RW and rclone-syncs the read-only Drive folder "claude" (stem claude/files) onto it (rclone/rclone:1.74.3, scope=drive.readonly, empty-source guard + --max-delete 25). ESO ExternalSecret stem95su-rclone <- Vault secret/stem95su. Requires the GCP OAuth app published to Production or the refresh token expires ~weekly. Lands the gdrive-sync stack on master (it had landed on a feature branch by accident on the shared devvm checkout). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 08:42:26 +00:00
Viktor Barzin	1757cb59e7	workstation: machine-wide config inheritance (managed claudeMd + setup-devvm.sh + skel) All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details Spike confirmed (claude 2.1.168): /etc/claude-code/managed-settings.json claudeMd reaches a session (sentinel echoed). Hybrid inheritance = enforced org claudeMd machine-wide (top precedence, non-overridable) + per-user ~/.claude/{skills,rules,...} symlinks to the config base (live, the proven emo pattern) seeded via /etc/skel. setup-devvm.sh is idempotent: apt toolset, node>=18 + claude-code, system-wide kubelogin (NOT the Azure apt pkg), the managed config, and /etc/skel (launcher that cd's $HOME/code, tmux UX, inheritance symlinks). Verified: emo unchanged (groups/symlinks/live sessions intact), emo can read the managed config, idempotent re-run clean. Security fix (host state): /home/wizard/.claude/settings.json was 0664, exposing MEMORY_API_KEY to all devvm users -> chmod 0600. chezmoi source needs a private_ prefix + the key templated out to persist this (dotfiles-repo follow-up). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-08 14:07:04 +00:00

10 commits