diff --git a/AGENTS.md b/AGENTS.md index 054a6ded..43d4cf3e 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -231,10 +231,18 @@ Per-workload opt-out: add the label `keel.sh/policy: never` on the Deployment me Non-admin devvm users (power-user / namespace-owner tiers) may not know git at all. Their agent handles every version-control step silently — never ask them to commit, push, pull, or open a PR, and never surface git jargon at them. -Their `~/code` clone arrives preconfigured: git identity, a `forgejo` remote +Their infra clone arrives preconfigured: git identity, a `forgejo` remote authenticated via `~/.git-credentials`, and `master` tracking `forgejo/master` (auto-freshened hourly and at session launch, fast-forward only). +Two per-user layouts exist (`code_layout` in +`scripts/workstation/roster.yaml`): `single` (the default) — `~/code` IS the +locked infra clone — and `workspace` — `~/code` is a plain directory of +per-project clones: the infra clone at `~/code/infra`, plus each roster +`repos` entry (e.g. `~/code/tripit`) cloned from Forgejo `viktor/` with +the user's own PAT. The reconcile auto-migrates a single-layout `~/code` when +a user is flipped to `workspace`, and keeps every clone fresh either way. + The model is **allow-then-audit** (Viktor, 2026-06-10): whitelisted users (emo) push straight to `master` — no PR gate — and the record of *what changed and why* is what matters. Force-push is disabled for everyone, so master history diff --git a/docs/architecture/multi-tenancy.md b/docs/architecture/multi-tenancy.md index b2ee5de1..03a51f9e 100644 --- a/docs/architecture/multi-tenancy.md +++ b/docs/architecture/multi-tenancy.md @@ -543,19 +543,19 @@ Separate from the in-cluster namespace-owner model above, the **devvm** (`10.0.1 **Config inheritance (live):** wizard authors the base (his chezmoi-versioned `~/.claude`). Two native layers carry it to every user — the enforced org `claudeMd` in `/etc/claude-code/managed-settings.json` (top precedence, all sessions) and per-user `~/.claude/{skills,rules,…}` **symlinks** to the base (seeded via `/etc/skel`; edits propagate live). Secrets stay per-user at mode 600, never symlinked. **The managed config self-deploys from the repo** (2026-06-10): the hourly reconcile's `sync_managed_config` installs `scripts/workstation/managed-settings.json` to `/etc/claude-code/` whenever the repo copy changes — so editing the claudeMd = edit + commit, no manual install — and `refresh_codex_mirror` regenerates each user's `~/.codex/AGENTS.md` (a static mirror of the claudeMd; only files carrying the mirror header are touched, user-customized ones are left alone). Repo-level guidance (`.claude/CLAUDE.md`, `AGENTS.md`, `CONTEXT.md` in the infra repo) reaches non-admins through their auto-freshened clones — commit + push and every user has it within the hour. -**Infra access:** non-admins get their own **writable, git-crypt-LOCKED** clone of the (public) infra repo at `~/code` — code/docs plaintext, secret files (`*.tfvars`, `secrets/**`) stay ciphertext. The provisioner clones anonymously from the public GitHub mirror; **contribute access is wired per-user on top** (see below). The apply boundary still holds (`scripts/tg apply` needs an admin Vault token + cluster RBAC), but **pushing `master` is NOT inert** — the Forgejo→Woodpecker webhook fires `.woodpecker/default.yml` (`event: push, branch: master`, `require_approval: forks` only), which terragrunt-applies changed stacks. `master` is **branch-protected on Forgejo** (force-push disabled for everyone — history is append-only; push + merge whitelists = `viktor` + explicitly granted users, deploy keys allowed). **Allow-then-audit (Viktor, 2026-06-10):** `ebarzin` (emo) is on the whitelist and pushes straight to `master` — no PR gate. The tracking burden moves to: (a) **commit messages that record what + why** (the agent instructions in AGENTS.md and the managed claudeMd require the body to paraphrase the user's request), (b) the **`notify-nonadmin-push` Slack audit step** in `.woodpecker/default.yml` — every master push by a non-admin author is posted to Slack (admin pushes are not), and (c) non-admins **never use `[ci skip]`** so every change fires the pipeline (and thus the audit feed). Users NOT on the whitelist fall back to `/` branches + PRs. **Clones stay fresh automatically** (2026-06-10): the hourly `t3-provision-users` reconcile runs `refresh_locked_clone` (fetch all remotes + fast-forward `master`, ONLY when on master with a clean tree and an upstream — dirty trees and local commits are left alone with a WARN), and `start-claude.sh` does the same freshen at session launch (15s-capped so an offline remote never stalls the session). +**Infra access:** non-admins get their own **writable, git-crypt-LOCKED** clone of the (public) infra repo — code/docs plaintext, secret files (`*.tfvars`, `secrets/**`) stay ciphertext. Its location depends on the per-user `code_layout` in `roster.yaml`: `single` (default) puts the clone AT `~/code`; `workspace` makes `~/code` a plain directory of per-project clones — the infra clone at `~/code/infra` plus each roster `repos` entry cloned from Forgejo `viktor/` **as the user** (their PAT authenticates, so private repos work; clone failures WARN and retry next hour). Flipping a user to `workspace` auto-migrates their existing `~/code` clone to `~/code/infra` (local branches/dirty state survive; running processes follow the moved inode). ancamilea = workspace + `tripit` since 2026-06-10. The provisioner clones infra anonymously from the public GitHub mirror; **contribute access is wired per-user on top** (see below). The apply boundary still holds (`scripts/tg apply` needs an admin Vault token + cluster RBAC), but **pushing `master` is NOT inert** — the Forgejo→Woodpecker webhook fires `.woodpecker/default.yml` (`event: push, branch: master`, `require_approval: forks` only), which terragrunt-applies changed stacks. `master` is **branch-protected on Forgejo** (force-push disabled for everyone — history is append-only; push + merge whitelists = `viktor` + explicitly granted users, deploy keys allowed). **Allow-then-audit (Viktor, 2026-06-10):** `ebarzin` (emo) is on the whitelist and pushes straight to `master` — no PR gate. The tracking burden moves to: (a) **commit messages that record what + why** (the agent instructions in AGENTS.md and the managed claudeMd require the body to paraphrase the user's request), (b) the **`notify-nonadmin-push` Slack audit step** in `.woodpecker/default.yml` — every master push by a non-admin author is posted to Slack (admin pushes are not), and (c) non-admins **never use `[ci skip]`** so every change fires the pipeline (and thus the audit feed). Users NOT on the whitelist fall back to `/` branches + PRs. **Clones stay fresh automatically** (2026-06-10): the hourly `t3-provision-users` reconcile runs `refresh_user_clone` over every managed clone — the infra clone and any workspace repos (fetch all remotes + fast-forward `master`, ONLY when on master with a clean tree and an upstream — dirty trees and local commits are left alone with a WARN) — and also `wire_forgejo_remote`, which idempotently adds the documented `forgejo` remote + `forgejo/master` upstream to infra clones that predate that contract. `start-claude.sh` does the same freshen at session launch (10s fetch cap per repo so an offline remote never stalls the session; workspace layouts freshen each repo under `~/code`). **Contribute access (per non-admin, manual — the anca/tripit PAT precedent):** 1. Add their Forgejo user as a **write** collaborator on `viktor/infra` (`PUT /api/v1/repos/viktor/infra/collaborators/`). 2. Mint a PAT — the admin REST endpoint 404s here, use the in-pod CLI: `kubectl -n forgejo exec deploy/forgejo -- su -s /bin/sh git -c "forgejo admin user generate-access-token --username --token-name devvm-infra-git --scopes 'write:repository'"`. 3. Install it in their `~/.git-credentials` (`https://:@forgejo.viktorbarzin.me`, mode 600) + `git config --global credential.helper store`, set `user.name`/`user.email`. -4. In their clone: `git remote add forgejo https://forgejo.viktorbarzin.me/viktor/infra.git` and `git branch --set-upstream-to=forgejo/master master` (origin stays the anonymous GitHub mirror). +4. The reconcile wires the clone side automatically (`wire_forgejo_remote`): `forgejo` remote + `master` tracking `forgejo/master` on every non-admin infra clone (origin stays the anonymous GitHub mirror). No manual step since 2026-06-10. 5. (Optional — Viktor's call per user) Grant direct master push: add their login to the `master` branch-protection push + merge whitelists (`PATCH /api/v1/repos/viktor/infra/branch_protections/master`). Done for `ebarzin` 2026-06-10. 6. Verify: branch push succeeds; a `master` push succeeds for whitelisted users and is rejected with `Not allowed to push to protected branch` otherwise. **Web-terminal session persistence (2026-06-10):** the tmux-based web terminal's named sessions (each running one Claude conversation) survive devvm reboots — `tmux-persist-save.timer` (5-min) snapshots every terminal user's sessions (name, cwd, conversation uuid from argv or the cwd-slug transcript dir) to `/var/lib/tmux-persist/.tsv`, and `tmux-persist-restore.service` recreates missing sessions at boot with `claude --resume ` (per-session idempotent; also handles partial loss). This is a **tmux/terminal-surface** feature, deliberately outside the t3 namespace: the t3 chat surface persists its own threads (`~/.t3` state, plus the daily `t3-backup-state` dump), and Claude conversations themselves were always durable (`~/.claude/projects/`) — what this adds is the volatile tmux wiring. -**Status (2026-06-10):** built + verified on the live host — capacity (8 GiB swap), config inheritance, roster-driven provisioner, per-user locked clone, per-user OIDC kubeconfig + the `oidc-power-user-readonly` ClusterRole + emo's `k8s_users` entry (applied + impersonation-verified), the Authentik `T3 Users` edge gate, **the emo Phase-5 cutover (own clone + launcher repoint + `code-shared` removal, completed 2026-06-10) and emo's contribute access (`ebarzin` write collaborator + PAT + protected `master`)**. Per the live `/etc/skel` design, non-admin `~/.claude/{rules,skills}` symlinks into the admin base are **kept** (they ARE the shared-base delivery mechanism — the plan's step to remove them is obsolete). **Remaining (held / future):** the offboarding apply-side (Phase 7), per-user MCP/auth injection, and roster-reconciled `T3 Users` membership. See `../runbooks/offboard-user.md` for deprovisioning. +**Status (2026-06-10):** built + verified on the live host — capacity (8 GiB swap), config inheritance, roster-driven provisioner, per-user locked clone, per-user OIDC kubeconfig + the `oidc-power-user-readonly` ClusterRole + emo's `k8s_users` entry (applied + impersonation-verified), the Authentik `T3 Users` edge gate, **the emo Phase-5 cutover (own clone + launcher repoint + `code-shared` removal, completed 2026-06-10) and emo's contribute access (`ebarzin` write collaborator + PAT + protected `master`)**, and **per-user `code_layout` with the ancamilea workspace cutover (infra → `~/code/infra`, `tripit` alongside, 2026-06-10)**. Per the live `/etc/skel` design, non-admin `~/.claude/{rules,skills}` symlinks into the admin base are **kept** (they ARE the shared-base delivery mechanism — the plan's step to remove them is obsolete). **Remaining (held / future):** the offboarding apply-side (Phase 7), per-user MCP/auth injection, and roster-reconciled `T3 Users` membership. See `../runbooks/offboard-user.md` for deprovisioning. ## Related diff --git a/scripts/t3-provision-users.sh b/scripts/t3-provision-users.sh index 0f5d18a3..31bc6f08 100644 --- a/scripts/t3-provision-users.sh +++ b/scripts/t3-provision-users.sh @@ -20,6 +20,12 @@ MAP=/etc/ttyd-user-map DRY_RUN="${DRY_RUN:-0}" # Public infra repo for the locked clone (no auth; the monorepo has no remote). INFRA_REMOTE="${INFRA_REMOTE:-https://github.com/ViktorBarzin/infra.git}" +# Canonical push target for non-admin infra clones (AGENTS.md "Non-admin +# workstation users"), and the base URL for workspace-layout `repos` entries — +# those clone AS the user so their ~/.git-credentials PAT authenticates +# against private Forgejo repos. +FORGEJO_INFRA_REMOTE="${FORGEJO_INFRA_REMOTE:-https://forgejo.viktorbarzin.me/viktor/infra.git}" +REPO_REMOTE_BASE="${REPO_REMOTE_BASE:-https://forgejo.viktorbarzin.me/viktor}" # Per-user OIDC kubeconfig (kubelogin/PKCE; cluster server+CA copied from the admin kubeconfig). OIDC_ISSUER="${OIDC_ISSUER:-https://authentik.viktorbarzin.me/application/o/kubernetes/}" ADMIN_KUBECONFIG="${ADMIN_KUBECONFIG:-/home/wizard/.kube/config}" @@ -27,22 +33,24 @@ ADMIN_KUBECONFIG="${ADMIN_KUBECONFIG:-/home/wizard/.kube/config}" log() { echo "[t3-provision] $*"; } run() { if [[ "$DRY_RUN" == 1 ]]; then echo "[dry-run] $*"; else "$@"; fi; } -# Per-non-admin writable, git-crypt-LOCKED infra clone at ~/code. Keyless + +# Per-non-admin writable, git-crypt-LOCKED infra clone at ~/. Keyless + # filter=cat ⇒ code/docs are plaintext, git-crypt'd secret files stay ciphertext. # Writable + ungated (push != apply; applies are admin-only). NEVER touches an -# existing ~/code (so emo's symlink survives until the gated cutover). +# existing target (so emo's symlink survives until the gated cutover). subpath +# is "code" (single layout) or "code/infra" (workspace layout). install_locked_clone() { - local user="$1" home + local user="$1" sub="$2" home dst home="$(getent passwd "$user" | cut -d: -f6)" [[ -z "$home" ]] && return 0 - [[ -e "$home/code" || -L "$home/code" ]] && return 0 - if [[ "$DRY_RUN" == 1 ]]; then echo "[dry-run] locked infra clone -> $user:$home/code"; return 0; fi - log "clone locked infra -> $user:~/code" - runuser -u "$user" -- git clone --quiet --no-checkout "$INFRA_REMOTE" "$home/code" - runuser -u "$user" -- git -C "$home/code" config filter.git-crypt.smudge cat - runuser -u "$user" -- git -C "$home/code" config filter.git-crypt.clean cat - runuser -u "$user" -- git -C "$home/code" config filter.git-crypt.required false - runuser -u "$user" -- git -C "$home/code" checkout --quiet master + dst="$home/$sub" + [[ -e "$dst" || -L "$dst" ]] && return 0 + if [[ "$DRY_RUN" == 1 ]]; then echo "[dry-run] locked infra clone -> $user:$dst"; return 0; fi + log "clone locked infra -> $user:~/$sub" + runuser -u "$user" -- git clone --quiet --no-checkout "$INFRA_REMOTE" "$dst" + runuser -u "$user" -- git -C "$dst" config filter.git-crypt.smudge cat + runuser -u "$user" -- git -C "$dst" config filter.git-crypt.clean cat + runuser -u "$user" -- git -C "$dst" config filter.git-crypt.required false + runuser -u "$user" -- git -C "$dst" checkout --quiet master } # Keep an EXISTING non-admin clone fresh (the admin's tree is never touched): fetch @@ -50,18 +58,98 @@ install_locked_clone() { # clean tree, upstream configured. Never rebases/merges; a non-ff master (local # commits) is the user's to reconcile and is only WARNed about. Fetch failures # (offline, missing credentials) are non-fatal: freshness is best-effort. -refresh_locked_clone() { - local user="$1" home +refresh_user_clone() { + local user="$1" sub="$2" home dir home="$(getent passwd "$user" | cut -d: -f6)" - [[ -n "$home" && -d "$home/code/.git" ]] || return 0 - if [[ "$DRY_RUN" == 1 ]]; then echo "[dry-run] refresh clone -> $user:$home/code"; return 0; fi - runuser -u "$user" -- env GIT_TERMINAL_PROMPT=0 git -C "$home/code" fetch --all --prune --quiet 2>/dev/null \ - || { log "WARN: clone fetch failed for $user (offline/credentials?) — skipped"; return 0; } - [[ "$(runuser -u "$user" -- git -C "$home/code" symbolic-ref --short -q HEAD)" == master ]] || return 0 - [[ -z "$(runuser -u "$user" -- git -C "$home/code" status --porcelain)" ]] || return 0 - runuser -u "$user" -- git -C "$home/code" rev-parse --verify -q 'master@{upstream}' >/dev/null || return 0 - runuser -u "$user" -- git -C "$home/code" merge --ff-only 'master@{upstream}' >/dev/null 2>&1 \ - || log "WARN: $user master not fast-forwardable (local commits?) — left as-is" + dir="$home/$sub" + [[ -n "$home" && -d "$dir/.git" ]] || return 0 + if [[ "$DRY_RUN" == 1 ]]; then echo "[dry-run] refresh clone -> $user:$dir"; return 0; fi + runuser -u "$user" -- env GIT_TERMINAL_PROMPT=0 git -C "$dir" fetch --all --prune --quiet 2>/dev/null \ + || { log "WARN: fetch failed for $user:$sub (offline/credentials?) — skipped"; return 0; } + [[ "$(runuser -u "$user" -- git -C "$dir" symbolic-ref --short -q HEAD)" == master ]] || return 0 + [[ -z "$(runuser -u "$user" -- git -C "$dir" status --porcelain)" ]] || return 0 + runuser -u "$user" -- git -C "$dir" rev-parse --verify -q 'master@{upstream}' >/dev/null || return 0 + runuser -u "$user" -- git -C "$dir" merge --ff-only 'master@{upstream}' >/dev/null 2>&1 \ + || log "WARN: $user:$sub master not fast-forwardable (local commits?) — left as-is" +} + +# Non-admin infra clones are documented to carry a `forgejo` remote (the +# canonical push target) with master tracking forgejo/master — see AGENTS.md +# "Non-admin workstation users". Clones made before that contract only have +# the GitHub origin; wire the remote + upstream idempotently. Best-effort: an +# offline fetch leaves the upstream as-is. +wire_forgejo_remote() { + local user="$1" sub="$2" home dir + home="$(getent passwd "$user" | cut -d: -f6)" + dir="$home/$sub" + [[ -n "$home" && -d "$dir/.git" ]] || return 0 + if ! runuser -u "$user" -- git -C "$dir" remote get-url forgejo >/dev/null 2>&1; then + if [[ "$DRY_RUN" == 1 ]]; then echo "[dry-run] add forgejo remote -> $user:$sub"; return 0; fi + log "add forgejo remote -> $user:~/$sub" + runuser -u "$user" -- git -C "$dir" remote add forgejo "$FORGEJO_INFRA_REMOTE" + fi + [[ "$DRY_RUN" == 1 ]] && return 0 + [[ "$(runuser -u "$user" -- git -C "$dir" rev-parse --abbrev-ref -q 'master@{upstream}' 2>/dev/null)" == forgejo/master ]] && return 0 + runuser -u "$user" -- env GIT_TERMINAL_PROMPT=0 git -C "$dir" fetch --quiet forgejo 2>/dev/null \ + || { log "WARN: forgejo fetch failed for $user — upstream left as-is"; return 0; } + runuser -u "$user" -- git -C "$dir" branch --set-upstream-to=forgejo/master master >/dev/null 2>&1 \ + && log "set $user:~/$sub master upstream -> forgejo/master" \ + || log "WARN: could not set $user:~/$sub master upstream to forgejo/master" +} + +# Workspace layout: ~/code is a plain directory of per-project clones. A user +# still on the single layout (~/code IS the infra clone) is migrated by moving +# the whole clone — local branches, dirty files, untracked state all survive — +# to ~/code/infra. Running processes follow the moved inode, so live sessions +# keep working (their cwd lands inside ~/code/infra). +ensure_workspace_layout() { + local user="$1" home tmp + home="$(getent passwd "$user" | cut -d: -f6)" + [[ -z "$home" ]] && return 0 + if [[ -d "$home/code/.git" ]]; then + if [[ "$DRY_RUN" == 1 ]]; then echo "[dry-run] migrate $user:~/code (single clone) -> ~/code/infra"; return 0; fi + log "migrate $user: ~/code (single infra clone) -> ~/code/infra" + tmp="$home/.code-workspace-migrate.$$" + mv "$home/code" "$tmp" + install -d -o "$user" -g "$user" -m 0755 "$home/code" + mv "$tmp" "$home/code/infra" + elif [[ ! -e "$home/code" ]]; then + if [[ "$DRY_RUN" == 1 ]]; then echo "[dry-run] create workspace dir $user:~/code"; return 0; fi + install -d -o "$user" -g "$user" -m 0755 "$home/code" + fi +} + +# Single-layout clones often accumulated nested project clones (the old layout +# gave users nowhere else to put them — e.g. ancamilea's tripit inside ~/code). +# After migration such a clone would sit buried at ~/code/infra/; hoist a +# roster repo to its workspace home instead of stranding it + cloning fresh. +# Only untracked git dirs move — content the infra repo tracks is never touched. +hoist_nested_repo() { + local user="$1" repo="$2" home src dst + home="$(getent passwd "$user" | cut -d: -f6)" + [[ -z "$home" ]] && return 0 + src="$home/code/infra/$repo"; dst="$home/code/$repo" + [[ -d "$src/.git" && ! -e "$dst" ]] || return 0 + runuser -u "$user" -- git -C "$home/code/infra" ls-files --error-unmatch "$repo" >/dev/null 2>&1 && return 0 + if [[ "$DRY_RUN" == 1 ]]; then echo "[dry-run] hoist nested $repo -> $user:$dst"; return 0; fi + log "hoist nested $repo clone -> $user:~/code/$repo" + mv "$src" "$dst" +} + +# Extra per-project repos for workspace-layout users, cloned from Forgejo AS +# the user (their ~/.git-credentials PAT authenticates against private repos). +# A failed clone (no access yet, offline) is a WARN — the reconcile must never +# abort over a single repo; the next hourly run retries. +install_user_repo() { + local user="$1" repo="$2" home dst + home="$(getent passwd "$user" | cut -d: -f6)" + [[ -z "$home" ]] && return 0 + dst="$home/code/$repo" + [[ -e "$dst" || -L "$dst" ]] && return 0 + if [[ "$DRY_RUN" == 1 ]]; then echo "[dry-run] clone $REPO_REMOTE_BASE/$repo.git -> $user:$dst"; return 0; fi + log "clone $repo -> $user:~/code/$repo" + runuser -u "$user" -- env GIT_TERMINAL_PROMPT=0 git clone --quiet "$REPO_REMOTE_BASE/$repo.git" "$dst" 2>/dev/null \ + || log "WARN: clone of $repo failed for $user (access/offline?) — skipped" } # Machine-wide Claude managed config: the repo file (in the admin tree, like the @@ -218,7 +306,11 @@ jq -e . "$desired_file" >/dev/null || { echo "[t3-provision] derive produced inv sync_managed_config # 4) per-account: create-if-absent + ADDITIVE tier groups (never strip) + locked clone -while IFS=$'\t' read -r os_user tier shell groups_csv; do +# NB: empty @tsv fields collapse under tab-IFS read (tab is IFS whitespace), so +# the jq below emits "-" for empty groups/repos and we map it back here. +while IFS=$'\t' read -r os_user tier shell groups_csv code_layout repos_csv; do + [[ "$groups_csv" == "-" ]] && groups_csv="" + [[ "$repos_csv" == "-" ]] && repos_csv="" if ! id "$os_user" >/dev/null 2>&1; then log "create account: $os_user (shell $shell)" run useradd -m -s "$shell" "$os_user" @@ -234,14 +326,29 @@ while IFS=$'\t' read -r os_user tier shell groups_csv; do log "add $os_user -> group $g"; run gpasswd -a "$os_user" "$g" >/dev/null done fi - if [[ "$tier" != admin ]]; then # non-admins: locked clone (kept fresh) + kubeconfig + shared Claude token - install_locked_clone "$os_user" - refresh_locked_clone "$os_user" + if [[ "$tier" != admin ]]; then # non-admins: locked clone(s) (kept fresh) + kubeconfig + shared Claude token + if [[ "$code_layout" == workspace ]]; then + ensure_workspace_layout "$os_user" + install_locked_clone "$os_user" code/infra + wire_forgejo_remote "$os_user" code/infra # before refresh: ff targets the canonical upstream same-pass + refresh_user_clone "$os_user" code/infra + IFS=',' read -ra extra_repos <<< "$repos_csv" + for repo in "${extra_repos[@]}"; do + [[ -n "$repo" ]] || continue + hoist_nested_repo "$os_user" "$repo" + install_user_repo "$os_user" "$repo" + refresh_user_clone "$os_user" "code/$repo" + done + else + install_locked_clone "$os_user" code + wire_forgejo_remote "$os_user" code # before refresh: ff targets the canonical upstream same-pass + refresh_user_clone "$os_user" code + fi install_user_kubeconfig "$os_user" install_user_claude_token "$os_user" fi refresh_codex_mirror "$os_user" # all tiers — mirror of the managed claudeMd -done < <(jq -r '.accounts[] | [.os_user, .tier, .shell, (.groups|join(","))] | @tsv' "$desired_file") +done < <(jq -r '.accounts[] | [.os_user, .tier, .shell, (if (.groups|length)==0 then "-" else (.groups|join(",")) end), .code_layout, (if (.repos|length)==0 then "-" else (.repos|join(",")) end)] | @tsv' "$desired_file") # 5) per-user .env (sticky port) + enable t3-serve@ while IFS=$'\t' read -r os_user port; do diff --git a/scripts/workstation/managed-settings.json b/scripts/workstation/managed-settings.json index fd2d2a3b..aac4bfc1 100644 --- a/scripts/workstation/managed-settings.json +++ b/scripts/workstation/managed-settings.json @@ -1,4 +1,4 @@ { - "claudeMd": "# Viktor Barzin homelab — shared multi-user Claude Code Workstation (devvm)\n\nYou are running as a specific OS user on a SHARED devvm Workstation, not as the admin. These org-wide rules apply to EVERY user and sit at the top of settings precedence (they cannot be overridden by a user's own config):\n\n- Respect your permission tier. Your kubectl, Vault, and infra access are scoped to your RBAC tier (admin / power-user / namespace-owner). Do not attempt to escalate privileges or reach another user's resources.\n- Secrets are per-user. Never read another user's home directory, credentials, tokens, or ~/.claude secrets. Your own secrets live in your home at mode 600.\n- Infrastructure changes go through Terraform/Terragrunt — never direct kubectl apply/edit/patch. Committed stack changes are auto-applied by CI on push to master; you can verify the live result with your read-only kubectl.\n- The AGENT does ALL git mechanics silently — the user may not know git, so never ask them to commit, push, pull, or open anything, and never surface git jargon. Feature-sized work is done in an isolated git worktree (`.worktrees/`, branch `/`) and merged into master when finished, so several agents can work the same project at once — full lifecycle in ~/.claude/rules/execution.md §3; trivial single-commit fixes may go straight to master. When you finish a change in ~/code: commit it ON master and push to the forgejo remote. THE COMMIT MESSAGE IS THE AUDIT TRAIL — subject says WHAT changed; body says WHY in plain words (paraphrase the user's actual request) — this matters more than the change itself. Never use [ci skip] as a non-admin (it would hide the change from the audit feed; harmless no-op applies are fine). If the push is rejected non-fast-forward, git pull --rebase forgejo master and push again. If it is rejected by branch protection (user not whitelisted), fall back to a / branch + PR via the Forgejo API (token = password field in ~/.git-credentials). Keep ~/code on a clean master when done so background auto-refresh keeps working. Tell the user in plain words what happened ('done — your change is live/recorded'). Full recipe: AGENTS.md → 'Non-admin workstation users' in ~/code.\n- Follow the engineering rules in ~/.claude/rules/ (execution, planning, quality) and every CLAUDE.md in the repo tree.\n- The monorepo is at ~/code. Non-admins get a git-crypt-LOCKED clone: secret files read as ciphertext — that is expected, not an error.", + "claudeMd": "# Viktor Barzin homelab — shared multi-user Claude Code Workstation (devvm)\n\nYou are running as a specific OS user on a SHARED devvm Workstation, not as the admin. These org-wide rules apply to EVERY user and sit at the top of settings precedence (they cannot be overridden by a user's own config):\n\n- Respect your permission tier. Your kubectl, Vault, and infra access are scoped to your RBAC tier (admin / power-user / namespace-owner). Do not attempt to escalate privileges or reach another user's resources.\n- Secrets are per-user. Never read another user's home directory, credentials, tokens, or ~/.claude secrets. Your own secrets live in your home at mode 600.\n- Infrastructure changes go through Terraform/Terragrunt — never direct kubectl apply/edit/patch. Committed stack changes are auto-applied by CI on push to master; you can verify the live result with your read-only kubectl.\n- The AGENT does ALL git mechanics silently — the user may not know git, so never ask them to commit, push, pull, or open anything, and never surface git jargon. Feature-sized work is done in an isolated git worktree (`.worktrees/`, branch `/`) and merged into master when finished, so several agents can work the same project at once — full lifecycle in ~/.claude/rules/execution.md §3; trivial single-commit fixes may go straight to master. When you finish a change in a repo under ~/code (or ~/code itself when it IS the clone): commit it ON master and push to the forgejo remote. THE COMMIT MESSAGE IS THE AUDIT TRAIL — subject says WHAT changed; body says WHY in plain words (paraphrase the user's actual request) — this matters more than the change itself. Never use [ci skip] as a non-admin (it would hide the change from the audit feed; harmless no-op applies are fine). If the push is rejected non-fast-forward, git pull --rebase forgejo master and push again. If it is rejected by branch protection (user not whitelisted), fall back to a / branch + PR via the Forgejo API (token = password field in ~/.git-credentials). Keep every clone on a clean master when done so background auto-refresh keeps working. Tell the user in plain words what happened ('done — your change is live/recorded'). Full recipe: AGENTS.md → 'Non-admin workstation users' in your infra clone.\n- Follow the engineering rules in ~/.claude/rules/ (execution, planning, quality) and every CLAUDE.md in the repo tree.\n- Code lives under ~/code, in one of two per-user layouts: either ~/code IS the git-crypt-LOCKED infra clone (single layout), or ~/code is a workspace directory of per-project clones — the locked infra clone at ~/code/infra plus other project repos alongside it (e.g. ~/code/tripit). [ -d ~/code/.git ] means single. In locked infra clones secret files read as ciphertext — that is expected, not an error.", "model": "claude-fable-5" } diff --git a/scripts/workstation/roster.yaml b/scripts/workstation/roster.yaml index 0319c824..b9059760 100644 --- a/scripts/workstation/roster.yaml +++ b/scripts/workstation/roster.yaml @@ -10,6 +10,14 @@ # power-user - cluster-wide READ (no Secrets) via oidc-power-user-readonly; locked clone # namespace-owner - admin in their own namespace(s) only; locked clone # +# Optional per-user code layout (non-admins): +# code_layout: single (default) - ~/code IS the locked infra clone +# code_layout: workspace - ~/code is a directory of per-project clones: +# the locked infra clone at ~/code/infra, plus +# each `repos` entry cloned from Forgejo +# viktor/ with the user's own PAT. +# A single-layout ~/code is auto-migrated. +# # wizard IS listed (as admin): the reconcile REGENERATES /etc/ttyd-user-map + # dispatch.json from this file, so omitting him would drop his t3 instance. The # provisioner skips account/group/clone mutations for already-existing users, so @@ -17,5 +25,5 @@ users: wizard: {authentik_user: vbarzin, k8s_user: wizard, tier: admin} # base config author + cluster-admin emo: {authentik_user: emil.barzin, k8s_user: emo, tier: power-user} # NET-NEW k8s_users entry (add as power-user before provisioning) - ancamilea: {authentik_user: ancaelena98, k8s_user: anca, tier: namespace-owner, namespaces: [plotting-book]} # ALREADY provisioned in-cluster -- assert, don't re-create + ancamilea: {authentik_user: ancaelena98, k8s_user: anca, tier: namespace-owner, namespaces: [plotting-book], code_layout: workspace, repos: [tripit]} # ALREADY provisioned in-cluster -- assert, don't re-create # gheorghe: {authentik_user: vabbit81, k8s_user: vabbit81, tier: namespace-owner, namespaces: [vabbit81]} # already a cluster ns-owner; uncomment to give him a devvm workstation diff --git a/scripts/workstation/roster_engine.py b/scripts/workstation/roster_engine.py index d9e7dd71..6e1b8545 100644 --- a/scripts/workstation/roster_engine.py +++ b/scripts/workstation/roster_engine.py @@ -13,6 +13,7 @@ per person and are recorded explicitly (no email->username derivation). from __future__ import annotations import json +import re import sys from dataclasses import dataclass, field from typing import Iterable @@ -21,6 +22,13 @@ import yaml BASE_PORT = 3773 VALID_TIERS = ("admin", "power-user", "namespace-owner") +# single - ~/code IS the locked infra clone (the original non-admin layout) +# workspace - ~/code is a plain directory of per-project clones; the locked +# infra clone lives at ~/code/infra and `repos` clone alongside it +VALID_CODE_LAYOUTS = ("single", "workspace") +# Repo names become root-executed clone/mv paths under ~/code — plain +# leading-alphanumeric names only (no separators, dotfiles, or option-like names). +_REPO_NAME_RE = re.compile(r"^[A-Za-z0-9][A-Za-z0-9._-]*$") # Tier -> supplementary groups the reconcile ENSURES (additive-only; never stripped). TIER_GROUPS: dict[str, tuple[str, ...]] = { "admin": ("code-shared", "docker", "sudo"), @@ -48,6 +56,8 @@ class User: k8s_user: str tier: str namespaces: tuple[str, ...] = () + code_layout: str = "single" + repos: tuple[str, ...] = () @dataclass(frozen=True) @@ -62,6 +72,8 @@ class Account: shell: str login_locked: bool groups: tuple[str, ...] + code_layout: str = "single" + repos: tuple[str, ...] = () @dataclass(frozen=True) @@ -98,7 +110,31 @@ def _parse_user(os_user: str, spec: dict) -> User: raise RosterError(f"user {os_user!r}: namespace-owner requires namespaces") if tier != "namespace-owner" and namespaces: raise RosterError(f"user {os_user!r}: only namespace-owner may set namespaces") - return User(os_user, spec["authentik_user"], spec["k8s_user"], tier, namespaces) + code_layout = spec.get("code_layout", "single") + if code_layout not in VALID_CODE_LAYOUTS: + raise RosterError( + f"user {os_user!r}: unknown code_layout {code_layout!r} " + f"(valid: {list(VALID_CODE_LAYOUTS)})" + ) + repos = tuple(spec.get("repos") or ()) + if repos and code_layout != "workspace": + raise RosterError(f"user {os_user!r}: repos require code_layout: workspace") + for repo in repos: + if not _REPO_NAME_RE.match(repo): + raise RosterError(f"user {os_user!r}: unsafe repo name {repo!r}") + if "infra" in repos: + raise RosterError( + f"user {os_user!r}: infra is implicit at ~/code/infra — drop it from repos" + ) + return User( + os_user, + spec["authentik_user"], + spec["k8s_user"], + tier, + namespaces, + code_layout, + repos, + ) def load_roster(text: str) -> Roster: @@ -205,6 +241,8 @@ def derive_desired_state( shell=DEFAULT_SHELL, login_locked=True, groups=TIER_GROUPS[u.tier], + code_layout=u.code_layout, + repos=u.repos, ) for u in roster.users.values() } @@ -257,6 +295,8 @@ def _desired_state_to_dict(ds: DesiredState) -> dict: "shell": a.shell, "login_locked": a.login_locked, "groups": list(a.groups), + "code_layout": a.code_layout, + "repos": list(a.repos), } for name, a in ds.accounts.items() }, diff --git a/scripts/workstation/skel/start-claude.sh b/scripts/workstation/skel/start-claude.sh index 1a630366..9778b2fc 100755 --- a/scripts/workstation/skel/start-claude.sh +++ b/scripts/workstation/skel/start-claude.sh @@ -19,17 +19,25 @@ fi cd "$HOME/code" 2>/dev/null || cd "$HOME" -# Freshen ~/code at session start so the user begins on current upstream state -# (the hourly t3-provision-users reconcile does the same in the background). -# Fast-forward only, and only when safe (on master + clean tree); hard 15s cap so -# an offline remote never stalls the launch. No-op for repos without remotes. -if [ -d "$HOME/code/.git" ]; then - GIT_TERMINAL_PROMPT=0 timeout 15 git -C "$HOME/code" fetch --all --prune --quiet 2>/dev/null || true - if [ "$(git -C "$HOME/code" symbolic-ref --short -q HEAD)" = master ] \ - && [ -z "$(git -C "$HOME/code" status --porcelain 2>/dev/null)" ] \ - && git -C "$HOME/code" rev-parse --verify -q 'master@{upstream}' >/dev/null 2>&1; then - git -C "$HOME/code" merge --ff-only 'master@{upstream}' >/dev/null 2>&1 || true +# Freshen the user's clone(s) at session start so they begin on current upstream +# state (the hourly t3-provision-users reconcile does the same in the background). +# Single layout freshens ~/code itself; workspace layout freshens each repo under +# ~/code. Fast-forward only, and only when safe (on master + clean tree); hard +# 10s fetch cap per repo so an offline remote never stalls the launch. +freshen_repo() { + GIT_TERMINAL_PROMPT=0 timeout 10 git -C "$1" fetch --all --prune --quiet 2>/dev/null || true + if [ "$(git -C "$1" symbolic-ref --short -q HEAD)" = master ] \ + && [ -z "$(git -C "$1" status --porcelain 2>/dev/null)" ] \ + && git -C "$1" rev-parse --verify -q 'master@{upstream}' >/dev/null 2>&1; then + git -C "$1" merge --ff-only 'master@{upstream}' >/dev/null 2>&1 || true fi +} +if [ -d "$HOME/code/.git" ]; then + freshen_repo "$HOME/code" +else + for repo_git in "$HOME"/code/*/.git; do + [ -d "$repo_git" ] && freshen_repo "${repo_git%/.git}" + done fi # Prefer the system-wide `claude` (installed by setup-devvm.sh); fall back to npx. diff --git a/scripts/workstation/test_roster_engine.py b/scripts/workstation/test_roster_engine.py index fe19c90e..ac34969c 100644 --- a/scripts/workstation/test_roster_engine.py +++ b/scripts/workstation/test_roster_engine.py @@ -86,6 +86,97 @@ def test_missing_users_key_is_valid_empty(): assert _roster("{}").users == {} +# -------------------------------------------------------------------------- +# code_layout + repos: per-user workspace layout (~/code/ clones) +# -------------------------------------------------------------------------- + + +def test_code_layout_defaults_to_single_with_no_repos(): + r = _roster("users: {emo: {authentik_user: e, k8s_user: emo, tier: power-user}}") + assert r.users["emo"].code_layout == "single" + assert r.users["emo"].repos == () + + +def test_workspace_layout_carries_repos(): + r = _roster( + """ + users: + ancamilea: {authentik_user: ancaelena98, k8s_user: anca, + tier: namespace-owner, namespaces: [plotting-book], + code_layout: workspace, repos: [tripit]} + """ + ) + u = r.users["ancamilea"] + assert u.code_layout == "workspace" + assert u.repos == ("tripit",) + + +def test_rejects_unknown_code_layout(): + with pytest.raises(eng.RosterError, match="code_layout"): + _roster( + "users: {bob: {authentik_user: b, k8s_user: b, tier: power-user, " + "code_layout: flat}}" + ) + + +def test_repos_require_workspace_layout(): + # repos clone to ~/code/, which only exists under the workspace layout. + with pytest.raises(eng.RosterError, match="workspace"): + _roster( + "users: {bob: {authentik_user: b, k8s_user: b, tier: power-user, " + "repos: [tripit]}}" + ) + + +@pytest.mark.parametrize("bad", ["../evil", "a/b", "", ".hidden", "-flag"]) +def test_rejects_path_unsafe_repo_name(bad): + # Repo names become root-executed clone/mv paths — reject anything that + # isn't a plain leading-alphanumeric name. + with pytest.raises(eng.RosterError, match="repo"): + _roster( + "users: {bob: {authentik_user: b, k8s_user: b, tier: power-user, " + f"code_layout: workspace, repos: ['{bad}']" "}}" + ) + + +def test_rejects_infra_in_repos(): + # The infra clone is implicit at ~/code/infra for workspace users. + with pytest.raises(eng.RosterError, match="implicit"): + _roster( + "users: {bob: {authentik_user: b, k8s_user: b, tier: power-user, " + "code_layout: workspace, repos: [infra]}}" + ) + + +def test_derive_accounts_carry_code_layout_and_repos(): + r = _roster( + """ + users: + emo: {authentik_user: e, k8s_user: emo, tier: power-user} + ancamilea: {authentik_user: a, k8s_user: anca, tier: namespace-owner, + namespaces: [plotting-book], code_layout: workspace, + repos: [tripit]} + """ + ) + ds = eng.derive_desired_state(r, {}) + assert ds.accounts["emo"].code_layout == "single" + assert ds.accounts["emo"].repos == () + assert ds.accounts["ancamilea"].code_layout == "workspace" + assert ds.accounts["ancamilea"].repos == ("tripit",) + + +def test_desired_state_dict_includes_code_layout_and_repos(): + # The JSON adapter is the contract the bash provisioner consumes via jq. + r = _roster( + "users: {ancamilea: {authentik_user: a, k8s_user: anca, " + "tier: namespace-owner, namespaces: [plotting-book], " + "code_layout: workspace, repos: [tripit]}}" + ) + d = eng._desired_state_to_dict(eng.derive_desired_state(r, {})) + assert d["accounts"]["ancamilea"]["code_layout"] == "workspace" + assert d["accounts"]["ancamilea"]["repos"] == ["tripit"] + + # -------------------------------------------------------------------------- # validate_tiers: roster tier vs live k8s_users (fail-loud, module #1) # --------------------------------------------------------------------------