From c611ecf84dd3203dec928b60afe9b882dc352a95 Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Mon, 8 Jun 2026 14:27:17 +0000 Subject: [PATCH] =?UTF-8?q?workstation:=20docs=20=E2=80=94=20multi-tenancy?= =?UTF-8?q?=20Workstation=20section=20+=20offboard=20runbook=20+=20service?= =?UTF-8?q?-catalog=20fix=20[ci=20skip]?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit multi-tenancy.md: new DevVM Workstation section (roster SSoT, tiers, config inheritance, locked clone, built-vs-gated status). service-catalog.md t3code row: corrected the stale 'source of truth = /etc/ttyd-user-map' (now roster.yaml; the map/dispatch are GENERATED). offboard-user.md: written (was a referenced-but-missing dead link) — staged reversible-cut-then-gated-destructive for both cluster + workstation surfaces. Co-Authored-By: Claude Opus 4.8 --- .claude/reference/service-catalog.md | 2 +- docs/architecture/multi-tenancy.md | 14 ++++++ docs/runbooks/offboard-user.md | 72 ++++++++++++++++++++++++++++ 3 files changed, 87 insertions(+), 1 deletion(-) create mode 100644 docs/runbooks/offboard-user.md diff --git a/.claude/reference/service-catalog.md b/.claude/reference/service-catalog.md index 1a722452..2a8ce952 100644 --- a/.claude/reference/service-catalog.md +++ b/.claude/reference/service-catalog.md @@ -32,7 +32,7 @@ |---------|-------------|-------| | k8s-dashboard | Kubernetes dashboard at `k8s.viktorbarzin.me`. **Forward-auth + auto-injected SA token** (apiserver OIDC blocked, see design §12). nginx token-injector (`dashboard_injector.tf`) maps `X-authentik-username` → the user's `dashboard-` SA token (ns admin + read-only on namespace-list/nodes only via `dashboard-nav-readonly` — no cross-tenant reads, `rbac/.../dashboard-sa.tf`; admins → cluster-admin SA) and sets `Authorization: Bearer` → no token-paste, dashboard auto-authenticates per user. Forward-auth admits `kubernetes-*` groups for this host (`stacks/authentik/admin-services-restriction.tf`). oauth2-proxy + `k8s-dashboard` OIDC app built but idle. | k8s-dashboard | | reverse-proxy | Generic reverse proxy | reverse-proxy | -| t3code | Multi-user coding-agent GUI at t3.viktorbarzin.me. `auth=required` (Authentik) → DevVM `t3-dispatch` service (`10.0.10.10:3780`, unprivileged user) maps `X-authentik-username` → that user's own `t3-serve@` instance (file perms enforced by uid; wizard→:3773, emo→:3774; unmapped→403) and **auto-injects the t3 session on first visit** (mints via the root `t3-mint` wrapper, scoped sudoers → `/api/auth/bootstrap` `t3_session` cookie). Source of truth `/etc/ttyd-user-map`; `t3-provision-users` reconcile (systemd timer) turns map entries into `t3-serve@` instances + `dispatch.json`. **Add a user:** one line in `/etc/ttyd-user-map` (must already be an OS account + Authentik identity) → reconcile. DevVM artifacts versioned in `infra/scripts/` (`t3-serve@.service`, `t3-provision-users`, `t3-dispatch/`, `t3-mint`, `sudoers-t3-autopair`, `t3-autoupdate.*`); TF (`stacks/t3code`) owns only the ingress + Endpoints→:3780. **t3 binary tracks `nightly`** via `t3-autoupdate` (daily systemd timer; health-check + auto-rollback on a bad build; restarts only idle instances) — so new models (e.g. Opus 4.8) land as t3 ships them. Native app/app.t3.codes unsupported (cross-origin) — deferred until published. Design: `docs/plans/2026-06-01-t3-auto-provision-*`. | t3code | +| t3code | Multi-user coding-agent GUI at t3.viktorbarzin.me. `auth=required` (Authentik) → DevVM `t3-dispatch` service (`10.0.10.10:3780`, unprivileged user) maps `X-authentik-username` → that user's own `t3-serve@` instance (file perms enforced by uid; wizard→:3773, emo→:3774; unmapped→403) and **auto-injects the t3 session on first visit** (mints via the root `t3-mint` wrapper, scoped sudoers → `/api/auth/bootstrap` `t3_session` cookie). **Source of truth = `infra/scripts/workstation/roster.yaml`** (os_user → authentik_user/k8s_user/tier/namespaces); `roster_engine.py` (pytest-covered) derives desired state and `t3-provision-users` (hourly systemd timer) applies it — constrained accounts, additive per-tier groups, `t3-serve@` instances, and **regenerating** `/etc/ttyd-user-map` + `dispatch.json` (those two are now GENERATED — do not hand-edit). New non-admins inherit wizard's Claude config (machine-wide managed `claudeMd` in `/etc/claude-code/managed-settings.json` + per-user `~/.claude/{skills,rules}` symlinks seeded by `/etc/skel`) and get a **writable git-crypt-LOCKED** infra clone at `~/code` (code plaintext, secret files ciphertext). Tiers: admin / power-user (cluster-wide read-only) / namespace-owner. **Add a user:** one entry in `roster.yaml` → reconcile (the per-user OIDC kubeconfig + Authentik `T3 Users` gate are separate gated cluster/auth applies). DevVM artifacts versioned in `infra/scripts/` (`t3-serve@.service`, `t3-provision-users` + `workstation/{roster.yaml,roster_engine.py,setup-devvm.sh,managed-settings.json,skel/}`, `t3-dispatch/`, `t3-mint`, `sudoers-t3-autopair`, `t3-autoupdate.*`); TF (`stacks/t3code`) owns only the ingress + Endpoints→:3780. **t3 binary tracks `nightly`** via `t3-autoupdate` (daily systemd timer; health-check + auto-rollback on a bad build; restarts only idle instances) — so new models (e.g. Opus 4.8) land as t3 ships them. Native app/app.t3.codes unsupported (cross-origin) — deferred until published. Design: `docs/plans/2026-06-01-t3-auto-provision-*`. | t3code | ## Active Use | Service | Description | Stack | diff --git a/docs/architecture/multi-tenancy.md b/docs/architecture/multi-tenancy.md index 58238122..ad27e149 100644 --- a/docs/architecture/multi-tenancy.md +++ b/docs/architecture/multi-tenancy.md @@ -533,6 +533,20 @@ cd stacks/actualbudget/factory terragrunt apply ``` +## DevVM Workstation (Claude Code multi-user) + +Separate from the in-cluster namespace-owner model above, the **devvm** (`10.0.10.10`, VMID 102) hosts per-user **Claude Code Workstations** behind `t3.viktorbarzin.me`. It reuses the same identity backbone — the Vault `k8s_users` map and Authentik — but adds a devvm-side layer. Authoritative design + phased plan: `docs/plans/2026-06-07-multi-user-workstation-{design,plan}.md` (PRD: ViktorBarzin/infra#9). + +**Single source of truth:** `infra/scripts/workstation/roster.yaml` (`os_user → authentik_user / k8s_user / tier / namespaces`). `roster_engine.py` (pytest-covered pure core) derives desired state; `t3-provision-users` (hourly timer) applies it — **additive-only** for existing users (never strips a group, replaces a home, or re-locks an account). `/etc/ttyd-user-map` + `dispatch.json` are **generated** from the roster (do not hand-edit). + +**RBAC tiers:** `admin` (Viktor — cluster-admin, unlocked tree, secrets) · `power-user` (cluster-wide read-only, NO Secrets, via a dedicated `oidc-power-user-readonly` ClusterRole) · `namespace-owner` (admin in own namespace only). Each session acts as the user's **own** OIDC identity (kubelogin), never the admin's. + +**Config inheritance (live):** wizard authors the base (his chezmoi-versioned `~/.claude`). Two native layers carry it to every user — the enforced org `claudeMd` in `/etc/claude-code/managed-settings.json` (top precedence, all sessions) and per-user `~/.claude/{skills,rules,…}` **symlinks** to the base (seeded via `/etc/skel`; edits propagate live). Secrets stay per-user at mode 600, never symlinked. + +**Infra access:** non-admins get their own **writable, git-crypt-LOCKED** clone of the (public) infra repo at `~/code` — code/docs plaintext, secret files (`*.tfvars`, `secrets/**`) stay ciphertext. Changes are ungated (push ≠ apply); the real boundary is apply-time (`scripts/tg apply` needs an admin Vault token + cluster RBAC). + +**Status (2026-06-08):** built + verified on the live host — capacity (8 GiB swap), config inheritance, roster-driven provisioner, per-user locked clone. **Gated / pending:** per-user OIDC kubeconfig + the `oidc-power-user-readonly` ClusterRole + emo's `k8s_users` entry, the Authentik `T3 Users` edge gate, the emo cutover (Phase 5), and the offboarding apply-side (Phase 7). See `../runbooks/offboard-user.md` for deprovisioning. + ## Related - [CI/CD Pipeline](./ci-cd.md) — Per-user Woodpecker pipelines diff --git a/docs/runbooks/offboard-user.md b/docs/runbooks/offboard-user.md new file mode 100644 index 00000000..104f4fcd --- /dev/null +++ b/docs/runbooks/offboard-user.md @@ -0,0 +1,72 @@ +# Runbook: Offboard a User + +Removing a user can span two surfaces — the **in-cluster** namespace-owner model +(Vault `k8s_users` / RBAC / namespace) and the **devvm Workstation** (roster / +OS account / t3 instance). Both are **staged**: a *reversible cut* (revoke access, +delete nothing) first, then an explicit, gated *destructive removal*. Do the +reversible cut immediately; only do the destructive step once you're sure. + +> Architecture: `../architecture/multi-tenancy.md`. Workstation design: +> `../plans/2026-06-07-multi-user-workstation-design.md`. + +--- + +## Part A — DevVM Workstation offboarding + +Driven by removing the user's entry from `infra/scripts/workstation/roster.yaml`. +`roster_engine.py offboard_plan` computes the staged actions (reversible cut vs the +gated `userdel_archive`, which is **never** auto-applied). + +### A1. Reversible cut (revoke access; delete nothing) + +1. **Delete the user's entry** from `roster.yaml`; commit + push. +2. **Reconcile** (`sudo /usr/local/bin/t3-provision-users`, or wait for the hourly + timer). This **regenerates** `/etc/ttyd-user-map` + `dispatch.json` *without* the + user → `t3-dispatch` now returns **403** for them. *(Automated.)* +3. **Disable their instance + lock login** *(manual today; Phase 7 will fold this into + the reconcile):* + ```bash + sudo systemctl disable --now t3-serve@.service + sudo passwd -l + ``` +4. **Verify:** they can no longer reach `t3.viktorbarzin.me` (302 → Authentik, then + denied once removed from the `T3 Users` group — Part C) and cannot log in. Nothing + is deleted; re-adding the roster entry + reconcile fully restores them. + +### A2. Destructive removal (explicit, gated — NEVER automatic) + +Only after the reversible cut and a deliberate decision: +```bash +sudo tar czf /mnt/backup/offboard/-$(date +%Y%m%d).tar.gz /home/ +sudo userdel -r # removes home + mail spool — IRREVERSIBLE +``` +Rollback before this step: re-add the roster entry + reconcile. After it: restore +from the archive. + +--- + +## Part B — In-cluster (namespace-owner) offboarding + +1. **Reversible cut:** remove the user's Authentik group membership (edge/RBAC blocked) + and their entry from the Vault `k8s_users` map (`secret/platform`). +2. **Apply:** `scripts/tg apply` the `vault` → `platform` → `woodpecker` stacks (drops the + RBAC binding, Vault identity/policy, and per-user CI). Their OIDC kubeconfig stops + authorizing immediately. +3. **Destructive (gated):** deleting their namespace(s) removes all their workloads + + data — back up first (PVCs, DBs), then delete only on explicit decision. + +--- + +## Part C — Authentik (both surfaces) + +Remove the user from the relevant Authentik group(s) — `kubernetes-namespace-owners` +(cluster) and/or `T3 Users` (workstation edge gate). This is the edge revocation; do +it as part of the reversible cut so they're locked out at the front door. + +--- + +## Order of operations + +Reversible cut on **all** relevant surfaces first (Authentik group → roster removal + +reconcile → `k8s_users` removal + apply) → verify access is gone → only then the gated +destructive steps (`userdel -r`, namespace deletion), each after its own archive.