workstation: v2 membership implementation plan [ci skip]

8 tasks: engine derive_os_user + roster_from_members (TDD); read-only Authentik token (TF); setup-devvm.sh stages it; provisioner sources T3 Users members from the Authentik API (replaces roster.yaml); Authentik-managed membership + legacy os_user attributes; retire roster.yaml; e2e add/remove smoke. Pairs with the 2026-06-09 design doc.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-09 12:09:14 +00:00
parent 48013a4a92
commit fbcc330214

View file

@ -0,0 +1,469 @@
# Workstation Membership v2 — Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax. This is **infra** work: the engine tasks are real pytest TDD; the host/Authentik tasks "verify" via an idempotent re-run + a smoke check with expected output. Honor the Terraform-only rule for cluster/Authentik changes (`scripts/tg apply`); devvm host scripts are the accepted exception. Claim `host:devvm` before host mutations and `stack:authentik` before applying Authentik.
**Goal:** Make the Authentik `T3 Users` group membership the single source of truth for who gets a devvm workstation account, identified by email; retire `roster.yaml`.
**Architecture:** The provisioner reads `T3 Users` members from the Authentik API (read-only token) instead of `roster.yaml`. A pure engine derives the Linux `os_user` from each member's email (or an `os_user` Authentik attribute override) and produces the same desired-state shape v1 already applies. Workstation access stays fully decoupled from cluster RBAC (`k8s_users` untouched). wizard is special-cased as the admin/owner.
**Tech Stack:** Python (pure engine, pytest) + Bash (provisioner) + `jq`/`curl` (Authentik API) + Terraform (`stacks/authentik`: read-only token, drop HCL members).
**Design:** `infra/docs/plans/2026-06-09-workstation-authentik-membership-design.md`.
---
## File structure
- Modify: `infra/scripts/workstation/roster_engine.py` — add `derive_os_user()` + `roster_from_members()` (pure).
- Modify: `infra/scripts/workstation/test_roster_engine.py` — tests for the two new functions.
- Modify: `infra/scripts/t3-provision-users.sh` — source members from the Authentik API instead of `roster.yaml`.
- Modify: `infra/scripts/workstation/setup-devvm.sh` — drop the read-only Authentik token to `/etc/t3-serve/authentik-token`.
- Create: `infra/stacks/authentik/t3-provision-token.tf` — read-only service account + API token.
- Modify: `infra/stacks/authentik/t3-users.tf` — drop the HCL `users` list (membership becomes Authentik-managed).
- Delete: `infra/scripts/workstation/roster.yaml` (Task 7).
- Modify: `infra/.claude/reference/service-catalog.md`, `infra/docs/architecture/multi-tenancy.md` (Task 7).
---
## Task 1: Engine — `derive_os_user()`
**Files:** Modify `infra/scripts/workstation/roster_engine.py`; Test `infra/scripts/workstation/test_roster_engine.py`
- [ ] **Step 1: Write the failing tests** (append to `test_roster_engine.py`)
```python
# --- derive_os_user: email/attribute -> Linux username (v2) ---
def test_derive_os_user_sanitizes_email_local_part():
assert eng.derive_os_user("emil.barzin@gmail.com", None) == "emil_barzin"
def test_derive_os_user_attribute_overrides():
assert eng.derive_os_user("emil.barzin@gmail.com", "emo") == "emo"
def test_derive_os_user_lowercases_and_replaces_unsafe_runs():
assert eng.derive_os_user("Weird.Name+tag@x.com", None) == "weird_name_tag"
def test_derive_os_user_truncates_to_32():
long = ("a" * 40) + "@x.com"
assert eng.derive_os_user(long, None) == "a" * 32
def test_derive_os_user_blank_attribute_is_ignored():
assert eng.derive_os_user("emil.barzin@gmail.com", "") == "emil_barzin"
```
- [ ] **Step 2: Run to verify they fail**
Run: `cd infra/scripts/workstation && python3 -m pytest test_roster_engine.py -k derive_os_user -q`
Expected: FAIL — `AttributeError: module 'roster_engine' has no attribute 'derive_os_user'`
- [ ] **Step 3: Implement** (add to `roster_engine.py`, after `RosterError`)
```python
import re
_MAX_USERNAME = 32
def derive_os_user(email: str, os_user_attr: str | None) -> str:
"""Linux username for a workstation member: the explicit `os_user` Authentik
attribute if set, else the email local-part sanitized to a valid username
(lowercase; runs of non [a-z0-9_-] -> '_'; stripped; <=32 chars)."""
if os_user_attr:
return os_user_attr
local = email.split("@", 1)[0].lower()
cleaned = re.sub(r"[^a-z0-9_-]+", "_", local).strip("_")
return cleaned[:_MAX_USERNAME]
```
- [ ] **Step 4: Run to verify they pass**
Run: `python3 -m pytest test_roster_engine.py -k derive_os_user -q`
Expected: PASS (5 passed)
- [ ] **Step 5: Commit**
```bash
cd /home/wizard/code/infra
git add scripts/workstation/roster_engine.py scripts/workstation/test_roster_engine.py
git commit -m "workstation: engine derive_os_user (email/attribute -> Linux username)"
```
---
## Task 2: Engine — `roster_from_members()`
Builds a `Roster` (the v1 type `derive_desired_state` already consumes) from the Authentik member list, so the existing tested derivation is reused unchanged.
**Files:** Modify `roster_engine.py`; Test `test_roster_engine.py`
- [ ] **Step 1: Write the failing tests**
```python
# --- roster_from_members: Authentik members -> Roster (v2) ---
MEMBERS = [
{"email": "vbarzin@gmail.com", "os_user": "wizard"},
{"email": "emil.barzin@gmail.com", "os_user": "emo"},
{"email": "ancaelena98@gmail.com", "os_user": "ancamilea"},
]
ADMINS = {"vbarzin@gmail.com"}
def test_roster_from_members_maps_identity_fields():
r = eng.roster_from_members(MEMBERS, ADMINS)
u = r.users["emo"]
assert u.os_user == "emo"
assert u.authentik_user == "emil.barzin" # email local-part = t3-dispatch key
assert u.k8s_user == "emil.barzin@gmail.com" # email = identity
assert u.tier == "power-user" # non-admin
def test_roster_from_members_admin_by_email():
r = eng.roster_from_members(MEMBERS, ADMINS)
assert r.users["wizard"].tier == "admin"
def test_roster_from_members_derives_os_user_when_no_override():
r = eng.roster_from_members([{"email": "jane.doe@x.com", "os_user": None}], set())
assert "jane_doe" in r.users
assert r.users["jane_doe"].tier == "power-user"
def test_roster_from_members_raises_on_os_user_collision():
members = [{"email": "a@x.com", "os_user": "dup"}, {"email": "b@y.com", "os_user": "dup"}]
with pytest.raises(eng.RosterError, match="collision"):
eng.roster_from_members(members, set())
def test_roster_from_members_reuses_derive_desired_state():
r = eng.roster_from_members(MEMBERS, ADMINS)
ds = eng.derive_desired_state(r, {"wizard": 3773, "emo": 3774, "ancamilea": 3775})
assert ds.dispatch["emil.barzin"] == {"os_user": "emo", "port": 3774}
assert ds.accounts["wizard"].groups == ("code-shared", "docker", "sudo")
assert ds.accounts["emo"].groups == ()
```
- [ ] **Step 2: Run to verify they fail**
Run: `python3 -m pytest test_roster_engine.py -k roster_from_members -q`
Expected: FAIL — `AttributeError: ... 'roster_from_members'`
- [ ] **Step 3: Implement** (add to `roster_engine.py`)
```python
def roster_from_members(members: list[dict], admin_emails: set[str]) -> Roster:
"""Build a Roster from Authentik `T3 Users` members. Each member dict has
`email` and optional `os_user`. tier = admin iff the email is in admin_emails,
else power-user (a non-admin workstation: no groups, locked clone). Raises on
an os_user collision (two emails resolving to the same Linux username)."""
users: dict[str, User] = {}
for m in members:
email = m["email"]
os_user = derive_os_user(email, m.get("os_user"))
if os_user in users:
raise RosterError(
f"os_user collision: {email!r} and {users[os_user].k8s_user!r} "
f"both resolve to {os_user!r} (set an os_user attribute to disambiguate)"
)
tier = "admin" if email in admin_emails else "power-user"
users[os_user] = User(
os_user=os_user,
authentik_user=email.split("@", 1)[0],
k8s_user=email,
tier=tier,
namespaces=(),
)
return Roster(users)
```
- [ ] **Step 4: Run the whole suite**
Run: `python3 -m pytest test_roster_engine.py -q && ruff check roster_engine.py test_roster_engine.py`
Expected: PASS (all, incl. the v1 tests) + ruff clean
- [ ] **Step 5: Commit**
```bash
git add scripts/workstation/roster_engine.py scripts/workstation/test_roster_engine.py
git commit -m "workstation: engine roster_from_members (Authentik members -> Roster, reuses derive)"
```
---
## Task 3: Read-only Authentik token (Terraform)
**Files:** Create `infra/stacks/authentik/t3-provision-token.tf`
- [ ] **Step 1: Write the resources** (service account + API token + view permissions)
```hcl
# Read-only service account whose token the devvm provisioner uses to list
# "T3 Users" members. View-only: it can read users + groups, nothing else.
resource "authentik_user" "t3_provision" {
username = "t3-provision-bot"
name = "T3 Provision (read-only)"
type = "service_account"
path = "service-accounts"
}
resource "authentik_token" "t3_provision" {
identifier = "t3-provision-readonly"
user = authentik_user.t3_provision.id
intent = "api"
description = "devvm t3-provision-users: read T3 Users membership"
retrieve_key = true
}
# Global view permissions for the service account (users + groups read only).
resource "authentik_rbac_permission_user" "t3_provision_view_user" {
user = authentik_user.t3_provision.id
permission = "authentik_core.view_user"
}
resource "authentik_rbac_permission_user" "t3_provision_view_group" {
user = authentik_user.t3_provision.id
permission = "authentik_core.view_group"
}
output "t3_provision_token" {
value = authentik_token.t3_provision.key
sensitive = true
}
```
- [ ] **Step 2: Apply** (claim first)
```bash
~/code/scripts/presence claim stack:authentik --purpose "v2: read-only t3-provision token"
export VAULT_ADDR=https://vault.viktorbarzin.me && vault login -method=oidc
cd /home/wizard/code/infra/stacks/authentik && ../../scripts/tg apply -target=authentik_user.t3_provision -target=authentik_token.t3_provision -target=authentik_rbac_permission_user.t3_provision_view_user -target=authentik_rbac_permission_user.t3_provision_view_group --non-interactive
```
Expected: 4 added. (If the `authentik_rbac_permission_user` resource/permission codename differs in the installed provider, run `../../scripts/tg console` / check the provider docs and adjust the codename; verify in Step 3.)
- [ ] **Step 3: Store the token in Vault + verify it is read-only**
```bash
TOK=$(../../scripts/tg output -raw t3_provision_token)
vault kv patch secret/authentik t3_provision_token="$TOK"
# verify: can LIST T3 Users members...
curl -sk -H "Authorization: Bearer $TOK" "https://authentik.viktorbarzin.me/api/v3/core/users/?groups_by_name=T3%20Users" | jq -r '.results[].email'
# ...but CANNOT write (expect 403):
curl -sk -o /dev/null -w '%{http_code}\n' -X PATCH -H "Authorization: Bearer $TOK" -H 'Content-Type: application/json' -d '{"name":"x"}' "https://authentik.viktorbarzin.me/api/v3/core/users/14/"
```
Expected: the three emails listed; the PATCH returns `403`.
- [ ] **Step 4: Commit**
```bash
git add stacks/authentik/t3-provision-token.tf
git commit -m "workstation: read-only Authentik token for the t3-provision membership query"
```
---
## Task 4: setup-devvm.sh — stage the token for the root provisioner
**Files:** Modify `infra/scripts/workstation/setup-devvm.sh`
- [ ] **Step 1: Add a token-staging step** (after step 6, before the final `log "OK"`). The hourly provisioner runs as root with no Vault token, so `setup-devvm.sh` (run by wizard, who can read Vault) drops it to a root-only file.
```bash
# 8) stage the read-only Authentik token for the root provisioner's membership query.
if command -v vault >/dev/null; then
export VAULT_ADDR="${VAULT_ADDR:-https://vault.viktorbarzin.me}"
if tok="$(vault kv get -field=t3_provision_token secret/authentik 2>/dev/null)"; then
install -m 0600 /dev/stdin /etc/t3-serve/authentik-token <<<"$tok"
log "staged /etc/t3-serve/authentik-token (read-only Authentik API)"
else
log "WARN: t3_provision_token not in Vault -> Authentik membership query will be skipped"
fi
fi
```
- [ ] **Step 2: Run + verify**
Run: `sudo bash /home/wizard/code/infra/scripts/workstation/setup-devvm.sh 2>&1 | grep -E 'authentik-token|OK'` then `sudo stat -c '%a %U' /etc/t3-serve/authentik-token`
Expected: "staged ... authentik-token" + `OK`; perms `600 root`.
- [ ] **Step 3: Commit**
```bash
git add scripts/workstation/setup-devvm.sh
git commit -m "workstation: setup-devvm.sh stages the read-only Authentik token (root-only)"
```
---
## Task 5: Provisioner — source members from Authentik (replace roster.yaml)
**Files:** Modify `infra/scripts/t3-provision-users.sh`
- [ ] **Step 1: Add a members-fetch + swap the engine call.** Replace the roster-read/derive block. Fetch members from Authentik (best-effort); build the members JSON `[{email, os_user}]`; pass to the engine via a new `--members-json` mode on `derive`.
First extend the engine CLI (`roster_engine.py` `_main`): add `derive-members` that reads a members JSON + ports JSON + admin emails and emits the same desired-state JSON.
```python
# in _main(), add a subparser:
pm = sub.add_parser("derive-members", help="desired state from an Authentik member list")
pm.add_argument("--members-json", required=True)
pm.add_argument("--ports-json", required=True)
pm.add_argument("--admin-emails", default="", help="comma-separated admin emails")
# ...in the dispatch:
if args.cmd == "derive-members":
with open(args.members_json, encoding="utf-8") as fh:
members = json.load(fh)
with open(args.ports_json, encoding="utf-8") as fh:
ports = json.load(fh)
admins = {e for e in args.admin_emails.split(",") if e}
ds = derive_desired_state(roster_from_members(members, admins), ports)
json.dump(_desired_state_to_dict(ds), sys.stdout, indent=2, sort_keys=True)
sys.stdout.write("\n")
return 0
```
In `t3-provision-users.sh`, replace the `ROSTER`/validate/derive section with:
```bash
AUTHENTIK_URL="${AUTHENTIK_URL:-https://authentik.viktorbarzin.me}"
TOKEN_FILE="${TOKEN_FILE:-/etc/t3-serve/authentik-token}"
T3_GROUP="${T3_GROUP:-T3 Users}"
ADMIN_EMAILS="${WORKSTATION_ADMIN_EMAILS:-vbarzin@gmail.com}"
members_file="$(mktemp)"; trap 'rm -f "$ports_file" "$members_file" "${desired_file:-}"' EXIT
if [[ -r "$TOKEN_FILE" ]]; then
tok="$(cat "$TOKEN_FILE")"
if curl -sf -H "Authorization: Bearer $tok" --get \
--data-urlencode "groups_by_name=$T3_GROUP" \
"$AUTHENTIK_URL/api/v3/core/users/" \
| jq -c '[.results[] | select(.is_active) | {email: .email, os_user: (.attributes.os_user // null)}]' \
> "$members_file" && [[ -s "$members_file" ]]; then
:
else
log "WARN: Authentik membership query failed -> no membership change this run"; echo '[]' > "$members_file"
SKIP_RECONCILE=1
fi
else
log "WARN: $TOKEN_FILE absent -> no membership change this run"; echo '[]' > "$members_file"; SKIP_RECONCILE=1
fi
if [[ "${SKIP_RECONCILE:-0}" == 1 ]]; then log "reconcile skipped (no Authentik membership)"; exit 0; fi
desired_file="$(mktemp)"
python3 "$ENGINE" derive-members --members-json "$members_file" --ports-json "$ports_file" --admin-emails "$ADMIN_EMAILS" > "$desired_file"
jq -e . "$desired_file" >/dev/null || { echo "[t3-provision] derive-members produced invalid JSON" >&2; exit 1; }
```
(Keep steps 4-6 of the existing script — accounts/groups/clone/kubeconfig, .env/enable, regen map/dispatch — unchanged; they consume `$desired_file`.)
- [ ] **Step 2: shellcheck + DRY_RUN** (with the staged token present)
Run: `cd /home/wizard/code/infra/scripts && shellcheck -S warning t3-provision-users.sh && sudo DRY_RUN=1 bash t3-provision-users.sh 2>&1 | grep -iE 'clone|kubeconfig|reconcile|WARN'`
Expected: shellcheck clean; dry-run lists the current members, no account creations (all exist), "reconcile complete (DRY-RUN)".
- [ ] **Step 3: Real run + verify it reproduces current state**
Run: `sudo jq -S . /etc/t3-serve/dispatch.json > /tmp/d1; sudo DRY_RUN=0 bash t3-provision-users.sh >/dev/null 2>&1; sudo jq -S . /etc/t3-serve/dispatch.json > /tmp/d2; diff /tmp/d1 /tmp/d2 && echo SAME; id -nG emo`
Expected: `SAME` (dispatch content unchanged); emo groups unchanged. Redeploy: `sudo install -m0755 t3-provision-users.sh /usr/local/bin/t3-provision-users`.
- [ ] **Step 4: Commit**
```bash
git add scripts/t3-provision-users.sh scripts/workstation/roster_engine.py scripts/workstation/test_roster_engine.py
git commit -m "workstation: provisioner sources members from Authentik T3 Users (replaces roster.yaml)"
```
---
## Task 6: Authentik — Authentik-managed membership + legacy os_user attributes
**Files:** Modify `infra/stacks/authentik/t3-users.tf`; set user attributes via API.
- [ ] **Step 1: Set the legacy os_user attributes** (the 3 existing accounts don't derive from their emails). Read-merge-write so existing attributes are preserved (Authentik PATCH replaces the `attributes` dict).
```bash
export VAULT_ADDR=https://vault.viktorbarzin.me
TOK=$(vault kv get -field=tf_api_token secret/authentik)
A=https://authentik.viktorbarzin.me/api/v3
set_os_user() { # $1=username $2=os_user
local pk attrs
pk=$(curl -sk -H "Authorization: Bearer $TOK" "$A/core/users/?username=$1" | jq '.results[0].pk')
attrs=$(curl -sk -H "Authorization: Bearer $TOK" "$A/core/users/$pk/" | jq -c --arg o "$2" '.attributes + {os_user:$o}')
curl -sk -X PATCH -H "Authorization: Bearer $TOK" -H 'Content-Type: application/json' \
-d "{\"attributes\":$attrs}" "$A/core/users/$pk/" | jq -r '.username + " os_user=" + .attributes.os_user'
}
set_os_user "vbarzin@gmail.com" wizard
set_os_user "emil.barzin@gmail.com" emo
set_os_user "ancaelena98@gmail.com" ancamilea
```
Expected: three lines confirming `os_user=` each.
- [ ] **Step 2: Drop the HCL `users` list** so membership is Authentik-managed. Edit `t3-users.tf`: remove the `users = [...]` argument from `resource "authentik_group" "t3_users"` (keep the `data "authentik_user"` lookups removed too if now unused). Leave the group resource (name only).
```hcl
resource "authentik_group" "t3_users" {
name = "T3 Users"
# Membership is managed in Authentik (UI/API), not Terraform — the devvm
# provisioner reconciles workstation accounts from this group's members.
}
```
- [ ] **Step 3: Apply + verify members unchanged**
```bash
cd /home/wizard/code/infra/stacks/authentik && ../../scripts/tg apply -target=authentik_group.t3_users --non-interactive
curl -sk -H "Authorization: Bearer $TOK" "$A/core/groups/?search=T3%20Users" | jq -r '.results[0].users_obj[].username'
```
Expected: apply shows the group updated (no member change / the `users` field no longer managed); the 3 members still listed.
- [ ] **Step 4: Commit**
```bash
git add stacks/authentik/t3-users.tf
git commit -m "workstation: T3 Users membership is Authentik-managed (drop HCL member list)"
```
---
## Task 7: Retire roster.yaml + update docs
**Files:** Delete `infra/scripts/workstation/roster.yaml`; modify `service-catalog.md`, `multi-tenancy.md`.
- [ ] **Step 1: Confirm nothing reads roster.yaml anymore**
Run: `grep -rn 'roster.yaml\|roster_engine.*roster\b' /home/wizard/code/infra/scripts /home/wizard/code/infra/docs | grep -v 'load_roster\|test_\|design.md\|-plan.md'`
Expected: no live references in the provisioner (the engine keeps `load_roster` for tests, that's fine).
- [ ] **Step 2: Delete it + update the service-catalog t3code row** — change "Source of truth = roster.yaml" to "Source of truth = the Authentik `T3 Users` group (members → accounts via the read-only API token); `os_user` from the email or a per-user `os_user` attribute". Update the multi-tenancy Workstation section's "single source of truth" line likewise.
```bash
git rm scripts/workstation/roster.yaml
# (edit service-catalog.md + multi-tenancy.md per above)
```
- [ ] **Step 3: Commit**
```bash
git add scripts/workstation/roster.yaml .claude/reference/service-catalog.md docs/architecture/multi-tenancy.md
git commit -m "workstation: retire roster.yaml — Authentik T3 Users group is the membership SSoT"
```
---
## Task 8: End-to-end smoke (add + remove a throwaway member)
- [ ] **Step 1: Add a throwaway test member** to `T3 Users` in Authentik (a test user, or temporarily add an existing one), set no `os_user` attribute. Run `sudo /usr/local/bin/t3-provision-users` and confirm an account `<derived>` is created (`id <derived>`), with a locked `~/code` (secret file shows `GITCRYPT`) and `~/.kube/config`.
- [ ] **Step 2: Remove the test member** from the group; run the reconcile; confirm they drop out of `/etc/ttyd-user-map` + `dispatch.json` (the reversible cut). Leave `userdel` to the gated offboarding runbook.
- [ ] **Step 3: Verify the 3 real users are intact**`id emo` (groups unchanged), emo/ancamilea/wizard still in `dispatch.json`, their `t3-serve@` active, emo's locked clone + ancamilea's intact.
---
## Self-review
- **Spec coverage:** Authentik-as-SSoT (Tasks 5,6) · email identity + os_user derive/override (Tasks 1,6) · provisioner reads the API (Task 5) · read-only token for the root timer (Tasks 3,4) · roster.yaml retires (Task 7) · k8s_users/cluster untouched (no task touches it) · wizard special-cased (admin_emails, Task 2). All covered.
- **Type consistency:** `derive_os_user(email, os_user_attr)` and `roster_from_members(members, admin_emails)` used consistently; `members` dicts are `{email, os_user}`; reuses the existing `User`/`Roster`/`derive_desired_state`/`DesiredState`.
- **apiserver-OIDC:** out of scope here (kubectl auth method only) — flagged in the design; the generic kubeconfig task is unchanged from v1.
- **Open risk:** the `authentik_rbac_permission_user` resource name / permission codenames may differ in the installed provider version (Task 3) — Step 3 verifies read-works/write-403 and says to adjust if needed.