infra/docs/runbooks/offboard-user.md
Viktor Barzin 5d9417fbaa workstation: emo contribute access + Phase-5 cutover done; gate master (push=apply) [ci skip]
ADR-0004's premise was wrong: pushing master fires the Woodpecker apply
pipeline (require_approval=forks only), so master pushes ARE deploys.
Added Forgejo branch protection on master (push/merge whitelist=viktor,
deploy keys allowed); non-admins contribute via branches + PRs.

emo (ebarzin): write collaborator on viktor/infra, PAT in
~/.git-credentials, forgejo remote + upstream in his locked clone.
Phase-5 finished: code-shared removed; ~/.claude symlinks kept (they
ARE the skel shared-base mechanism — plan step 4c obsolete).
Offboard runbook: revoke PAT + collaborator + group steps added.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:30:41 +00:00

3.9 KiB

Runbook: Offboard a User

Removing a user can span two surfaces — the in-cluster namespace-owner model (Vault k8s_users / RBAC / namespace) and the devvm Workstation (roster / OS account / t3 instance). Both are staged: a reversible cut (revoke access, delete nothing) first, then an explicit, gated destructive removal. Do the reversible cut immediately; only do the destructive step once you're sure.

Architecture: ../architecture/multi-tenancy.md. Workstation design: ../plans/2026-06-07-multi-user-workstation-design.md.


Part A — DevVM Workstation offboarding

Driven by removing the user's entry from infra/scripts/workstation/roster.yaml. roster_engine.py offboard_plan computes the staged actions (reversible cut vs the gated userdel_archive, which is never auto-applied).

A1. Reversible cut (revoke access; delete nothing)

  1. Delete the user's entry from roster.yaml; commit + push.
  2. Reconcile (sudo /usr/local/bin/t3-provision-users, or wait for the hourly timer). This regenerates /etc/ttyd-user-map + dispatch.json without the user → t3-dispatch now returns 403 for them. (Automated.)
  3. Disable their instance + lock login (manual today; Phase 7 will fold this into the reconcile):
    sudo systemctl disable --now t3-serve@<os_user>.service
    sudo passwd -l <os_user>
    
  4. Revoke git + group access (manual):
    # legacy secret-bearing group, if they were ever in it
    sudo gpasswd -d <os_user> code-shared
    # drop write access to the infra repo
    curl -X DELETE -H "Authorization: token <admin_pat>" \
      https://forgejo.viktorbarzin.me/api/v1/repos/viktor/infra/collaborators/<forgejo_login>
    # revoke their devvm git PAT (token name: devvm-infra-git; admin PAT may
    # manage other users' tokens — verified 2026-06-10; the CLI has no delete)
    curl -X DELETE -H "Authorization: token <admin_pat>" \
      https://forgejo.viktorbarzin.me/api/v1/users/<forgejo_login>/tokens/devvm-infra-git
    
    Note: their already-running sessions keep dropped groups until cycled — restart t3-serve@<os_user> to enforce immediately.
  5. Verify: they can no longer reach t3.viktorbarzin.me (302 → Authentik, then denied once removed from the T3 Users group — Part C) and cannot log in. Nothing is deleted; re-adding the roster entry + reconcile fully restores them.

A2. Destructive removal (explicit, gated — NEVER automatic)

Only after the reversible cut and a deliberate decision:

sudo tar czf /mnt/backup/offboard/<os_user>-$(date +%Y%m%d).tar.gz /home/<os_user>
sudo userdel -r <os_user>          # removes home + mail spool — IRREVERSIBLE

Rollback before this step: re-add the roster entry + reconcile. After it: restore from the archive.


Part B — In-cluster (namespace-owner) offboarding

  1. Reversible cut: remove the user's Authentik group membership (edge/RBAC blocked) and their entry from the Vault k8s_users map (secret/platform).
  2. Apply: scripts/tg apply the vaultplatformwoodpecker stacks (drops the RBAC binding, Vault identity/policy, and per-user CI). Their OIDC kubeconfig stops authorizing immediately.
  3. Destructive (gated): deleting their namespace(s) removes all their workloads + data — back up first (PVCs, DBs), then delete only on explicit decision.

Part C — Authentik (both surfaces)

Remove the user from the relevant Authentik group(s) — kubernetes-namespace-owners (cluster) and/or T3 Users (workstation edge gate). This is the edge revocation; do it as part of the reversible cut so they're locked out at the front door.


Order of operations

Reversible cut on all relevant surfaces first (Authentik group → roster removal + reconcile → k8s_users removal + apply) → verify access is gone → only then the gated destructive steps (userdel -r, namespace deletion), each after its own archive.