Commit graph

12 commits

Author SHA1 Message Date
Viktor Barzin
ef555c7e02 workstation: put ~/.local/bin on PATH so the launcher finds native claude
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor hit "~/.local/bin is not part of the PATH". Root cause: the native claude
binary lives in ~/.local/bin, but the terminal launcher (start-claude.sh) runs in
tmux's NON-login bash env, which doesn't source the user's shell rc where the native
installer put ~/.local/bin on PATH. So `command -v claude` failed there → the
launcher's bootstrap re-ran the native installer → the installer printed the PATH
warning. (Interactive zsh already had ~/.local/bin via the per-user installer rc edit,
and t3-serve sets PATH in its unit — so only the terminal launcher was affected.)

- skel/start-claude.sh: prepend ~/.local/bin to PATH near the top (guarded/idempotent),
  before the launch logic — so `claude` is found, no reinstall, no warning.
- setup-devvm.sh: install /etc/profile.d/10-local-bin.sh — adds ~/.local/bin to PATH for
  all LOGIN shells machine-wide (SSH etc.), independent of the per-user installer rc edit
  (fresh-user-safe). zsh login picks it up via /etc/zsh/zprofile -> /etc/profile.
- docs/architecture/multi-tenancy.md: documented the three PATH-injection points.

Verified: guard adds-when-missing / no-dup-when-present; all scripts pass bash -n.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 17:20:03 +00:00
Viktor Barzin
eecd78233b workstation: standardize on the native claude install (drop npm-global + npx)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Question from Viktor: should claude run via the binary or npx? Answer: the native
install is the recommended runtime (self-contained, self-updating ~/.local/bin/claude;
installMethod=native) — and every existing user had already auto-migrated to it, leaving
the npm-global copy empty and the npx fallback dead. "Leave only the recommended setup":

- setup-devvm.sh: node is now installed ONLY for the t3 CLI; dropped the machine-wide
  `npm install -g @anthropic-ai/claude-code` (npm/npx is not the recommended runtime and
  just shadowed the per-user native installs).
- t3-provision-users.sh: new per-user `install_user_claude_native` (runs the official
  https://claude.ai/install.sh AS the user, idempotent/skip-if-present) — provisions native
  claude for BOTH the terminal launcher and each t3-serve instance, replacing the npm bootstrap.
- skel/start-claude.sh: launcher runs the native `claude` only; if missing it bootstraps via
  the native installer (was an `npx @anthropic-ai/claude-code` fallback).
- docs/architecture/multi-tenancy.md: documented the native-only runtime model.

node stays (the pinned t3 CLI is npm-global). Verified: native installer reachable +
produces ~/.local/bin/claude 2.1.177; all three scripts pass bash -n + shellcheck.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 17:12:05 +00:00
Viktor Barzin
312c418a9a workstation: setup-devvm.sh installs the systemd service layer (reproducible rebuild)
The t3 system units (t3-serve@, t3-autoupdate, t3-backup-state, t3-provision-users,
t3-dispatch) + the t3-dispatch Go binary + t3-mint + the sudoers grant were all
hand-scp'd and would NOT survive a fresh devvm. setup-devvm.sh now installs + enables
them: build-if-absent for the Go binary, visudo-validated sudoers (a malformed
/etc/sudoers.d file breaks all sudo), timers self-heal, t3-dispatch system account
created if absent. t3-serve@ stays a per-user template enabled by the provisioner;
the ttyd terminal-lobby chain ships from its own repo (viktor/terminal-lobby).

Verified: shellcheck clean, go build compiles, visudo parses the sudoers, units parse.
NOT run live (would re-assert apt/npm on the shared host) — exercised on next rebuild.

[ci skip]

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 09:07:20 +00:00
Viktor Barzin
eb8695743b workstation: fix setup-devvm.sh provisioner correctness (claude detect, kubelogin pin, codex auth, t3-serve dir)
- claude-code: detect via `npm ls -g` not `command -v claude` — the admin's
  personal ~/.local/bin/claude shadowed the PATH check, so the system-wide
  install never ran (/usr/lib/node_modules/@anthropic-ai empty, no /usr/bin/claude;
  fresh non-admins had no claude). Found during the devvm reproducibility audit.
- kubelogin: pin v1.36.2 instead of releases/latest/download, so two fresh boxes
  built weeks apart are byte-identical.
- /etc/t3-serve: mkdir before the token writes (install -m doesn't create the
  parent — section 8 would fail on a fresh box).
- codex shared auth: stage /opt/codex-shared/auth.json from Vault
  secret/workstation.codex_shared_auth_json (key already existed but nothing
  consumed it — was a manual step lost on rebuild), mirroring the Claude token.

[ci skip]

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 21:41:54 +00:00
Viktor Barzin
3e7093947d t3: bump pin 0.0.24 -> 0.0.26 (fable-5) [ci skip]
Completes the 0.0.26 adoption prepared in fcb84ce0 (version-agnostic
dispatch browser-session/bootstrap fallback + Gate-2 real pairing
health-check + per-user state.sqlite backup). 0.0.26 verified
end-to-end on the devvm: emo + ancamilea auto-pair via t3-dispatch
(302 + Set-Cookie t3_session) after migrating state.sqlite 30->32;
pre-cutover backups in /var/backups/t3-state. Brings claude-fable-5
into the t3 model picker.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 21:41:53 +00:00
Viktor Barzin
baac46415f t3: pin t3@0.0.24 + stop nightly auto-update (auth-outage fix) [ci skip]
The t3-autoupdate timer (re-enabled by the provisioner's step 5b with
`--now`, which fires the missed daily job immediately on a Persistent
timer) pulled t3@nightly 0.0.25 mid-day. That build ran forward schema
migrations on every ~/.t3 state.sqlite (auth_pairing_links/auth_sessions
role->scopes, +proof_key_thumbprint) AND changed the bootstrap API,
breaking t3-mint/pairing for ALL devvm users (pair prompt, no session).

- t3-autoupdate.sh: now a pinned-version ENFORCER (T3_PIN=0.0.24), not a
  nightly tracker -- re-asserts the pin (a no-op when correct).
- t3-provision-users.sh step 5b: drop `--now` (it triggered the
  immediate missed-job run that pulled the bad build).
- setup-devvm.sh: install pinned t3@0.0.24 at machine setup.
- unit Descriptions + service-catalog reflect the pin.
- post-mortem: 2026-06-09-t3-nightly-autoupdate-auth-outage.md.

Host already reconciled out-of-band: rolled back to 0.0.24, re-enabled
the (now-pinned) enforcer, reset the 2 new users' disposable DBs,
surgically reverted wizard's auth tables to level-30 (96 threads + live
session preserved). All users verified 302 + t3_session.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 21:41:53 +00:00
Viktor Barzin
39e35ca8c9 workstation: share admin Claude subscription with non-admins (CLAUDE_CODE_OAUTH_TOKEN)
Non-admins without their own ~/.claude login get the shared long-lived sk-ant-oat01 token injected into their t3-serve env, so their agent authenticates against the admin's subscription. setup-devvm.sh stages it from Vault secret/workstation.claude_oauth_token (root-readable); the provisioner's install_user_claude_token injects per-user, if-absent (never clobbers emo's own login). Live-fixed anca (verified AUTHOK); this codifies it for reproducibility + future users. NOT pushed (shared-tree divergence hold).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 21:41:53 +00:00
Viktor Barzin
fd0f4a0365 fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip]
6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 08:45:33 +00:00
Viktor Barzin
6d224861c4 stem95su: scheduled Drive->site sync CronJob (every 10m)
CronJob stem95su-gdrive-sync (*/10) mounts the content PVC RW and
rclone-syncs the read-only Drive folder "claude" (stem claude/files) onto
it (rclone/rclone:1.74.3, scope=drive.readonly, empty-source guard +
--max-delete 25). ESO ExternalSecret stem95su-rclone <- Vault
secret/stem95su. Requires the GCP OAuth app published to Production or the
refresh token expires ~weekly.

Lands the gdrive-sync stack on master (it had landed on a feature branch
by accident on the shared devvm checkout).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 08:42:26 +00:00
Viktor Barzin
06f5c12476 workstation: setup-devvm.sh hardens the admin's unlocked tree (o-rx, not world-readable)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
Codifies the leak fix found during the emo cutover: /home/wizard/code is git-crypt-DECRYPTED in the admin's working tree, but was mode 0775 (o+rx) — so any devvm user (even outside code-shared) could read decrypted secrets by path (verified: emo read certificate.pfx as plaintext DER). setup-devvm.sh now chmod o-rx the admin tree so a rebuild keeps it. Live fix already applied (now drwxrws---).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 18:08:52 +00:00
Viktor Barzin
2c1865eabb workstation: roster-driven provisioner (SSoT reconcile, additive-only)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
t3-provision-users.sh now consumes roster_engine.py: derives accounts + per-tier groups + sticky ports + /etc/ttyd-user-map + dispatch.json from roster.yaml and applies them. ADDITIVE-ONLY for existing users (never strips a group, replaces a home, or re-locks an account) so the hourly timer is always safe. Best-effort tier validation vs live k8s_users: warns on a net-new absent user (emo), aborts only on a real tier conflict, skips when root has no Vault token. DRY_RUN mode for safe testing. Verified on the live host: reproduces dispatch.json content exactly, emo/anca groups + all t3-serve instances unchanged, idempotent, shellcheck-clean; deployed to /usr/local/bin (hourly timer target).

Engine: validate_tiers now returns ValidationIssue(severity) — error=conflict (abort) vs warn=absent (grant pending) — + has_blocking_errors(); 28 pytest cases. setup-devvm.sh redeploys the provisioner for reproducibility.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 14:18:12 +00:00
Viktor Barzin
1757cb59e7 workstation: machine-wide config inheritance (managed claudeMd + setup-devvm.sh + skel)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
Spike confirmed (claude 2.1.168): /etc/claude-code/managed-settings.json claudeMd reaches a session (sentinel echoed). Hybrid inheritance = enforced org claudeMd machine-wide (top precedence, non-overridable) + per-user ~/.claude/{skills,rules,...} symlinks to the config base (live, the proven emo pattern) seeded via /etc/skel. setup-devvm.sh is idempotent: apt toolset, node>=18 + claude-code, system-wide kubelogin (NOT the Azure apt pkg), the managed config, and /etc/skel (launcher that cd's $HOME/code, tmux UX, inheritance symlinks). Verified: emo unchanged (groups/symlinks/live sessions intact), emo can read the managed config, idempotent re-run clean.

Security fix (host state): /home/wizard/.claude/settings.json was 0664, exposing MEMORY_API_KEY to all devvm users -> chmod 0600. chezmoi source needs a private_ prefix + the key templated out to persist this (dotfiles-repo follow-up).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 14:07:04 +00:00