The cluster implements namespace-based multi-tenancy where each user receives their own Kubernetes namespace(s), RBAC roles, resource quotas, and CI/CD access. Onboarding is Vault-driven: add user metadata to `secret/platform → k8s_users`, apply Terraform stacks, and all resources (namespace, policies, RBAC, DNS, TLS) are auto-generated. Users access the cluster via OIDC authentication through Authentik and can self-service via k8s-portal.
## Architecture Diagram
```mermaid
graph TB
A[Admin: Add to Authentik Groups] --> B[Admin: Add to Vault k8s_users]
B --> C[Apply vault Stack]
C --> D[Apply platform Stack]
D --> E[Apply woodpecker Stack]
C --> C1[Create Namespace]
C --> C2[Create Vault Policy<br/>namespace-owner-user]
C --> C3[Create Vault Identity<br/>Entity + OIDC Alias]
C --> C4[Create K8s Deployer Role<br/>Vault K8s Auth]
D --> D1[Create RBAC RoleBinding<br/>Namespace Admin]
D --> D2[Create RBAC ClusterRoleBinding<br/>Cluster Read-Only]
D --> D3[Create ResourceQuota]
D --> D4[Create TLS Secret]
D --> D5[Create Cloudflare DNS]
E --> E1[Grant Woodpecker Admin]
F[User: Run Setup Script] --> F1[Install kubectl, kubelogin,<br/>Vault CLI, Terraform]
**Cause**: Forgejo username doesn't match Vault `k8s_users` key
**Fix**:
```bash
# Rename Forgejo user to match Vault key
# OR update k8s_users key to match Forgejo username, then terragrunt apply
```
### ResourceQuota: "Forbidden: exceeded quota"
**Cause**: User exceeded namespace quota
**Fix**:
```bash
# Check quota usage
kubectl describe quota -n alice-prod
# User must delete resources or request quota increase
# To increase: update k8s_users in Vault, apply platform stack
```
### DNS Not Resolving
**Cause**: Cloudflare DNS not created by platform stack
**Fix**:
```bash
# Check domains in k8s_users
vault kv get secret/platform | jq -r '.data.data.k8s_users.alice.domains'
# Apply platform stack
cd stacks/platform
terragrunt apply
# Verify in Cloudflare dashboard
```
### TLS Secret Missing
**Cause**: cert-manager failed to issue certificate
**Fix**:
```bash
# Check cert-manager logs
kubectl logs -n cert-manager deploy/cert-manager
# Check Certificate resource
kubectl get certificate -n alice-prod
# Check CertificateRequest
kubectl describe certificaterequest -n alice-prod
# If Let's Encrypt rate limited, wait 1 week or use staging
```
### User Can't See Cluster Resources
**Cause**: ClusterRoleBinding not created
**Fix**:
```bash
# Check ClusterRoleBinding exists
kubectl get clusterrolebinding | grep alice
# Apply platform stack
cd stacks/platform
terragrunt apply
```
### Factory Pattern: New User Not Created
**Cause**: Module block not added to `factory/main.tf`
**Fix**:
```bash
# Edit factory/main.tf
cat >> stacks/actualbudget/factory/main.tf <<EOF
module "charlie" {
source = "../"
user = "charlie"
domain = "budget.charlie.viktorbarzin.me"
}
EOF
# Apply
cd stacks/actualbudget/factory
terragrunt apply
```
## DevVM Workstation (Claude Code multi-user)
Separate from the in-cluster namespace-owner model above, the **devvm** (`10.0.10.10`, VMID 102) hosts per-user **Claude Code Workstations** behind `t3.viktorbarzin.me`. It reuses the same identity backbone — the Vault `k8s_users` map and Authentik — but adds a devvm-side layer. Authoritative design + phased plan: `docs/plans/2026-06-07-multi-user-workstation-{design,plan}.md` (PRD: ViktorBarzin/infra#9).
**Single source of truth:** `infra/scripts/workstation/roster.yaml` (`os_user → authentik_user / k8s_user / tier / namespaces`). `roster_engine.py` (pytest-covered pure core) derives desired state; `t3-provision-users` (hourly timer) applies it — **additive-only** for existing users (never strips a group, replaces a home, or re-locks an account). `/etc/ttyd-user-map` + `dispatch.json` are **generated** from the roster (do not hand-edit).
**RBAC tiers:** `admin` (Viktor — cluster-admin, unlocked tree, secrets) · `power-user` (cluster-wide read-only, NO Secrets, via a dedicated `oidc-power-user-readonly` ClusterRole) · `namespace-owner` (admin in own namespace only). Each session acts as the user's **own** OIDC identity (kubelogin), never the admin's.
**Config inheritance (live):** wizard authors the base (his chezmoi-versioned `~/.claude`). Two native layers carry it to every user — the enforced org `claudeMd` in `/etc/claude-code/managed-settings.json` (top precedence, all sessions) and per-user `~/.claude/{skills,rules,…}`**symlinks** to the base (seeded via `/etc/skel`; edits propagate live). Secrets stay per-user at mode 600, never symlinked.
**Infra access:** non-admins get their own **writable, git-crypt-LOCKED** clone of the (public) infra repo at `~/code` — code/docs plaintext, secret files (`*.tfvars`, `secrets/**`) stay ciphertext. The provisioner clones anonymously from the public GitHub mirror; **contribute access is wired per-user on top** (see below). The apply boundary still holds (`scripts/tg apply` needs an admin Vault token + cluster RBAC), but **pushing `master` is NOT inert** — the Forgejo→Woodpecker webhook fires `.woodpecker/default.yml` (`event: push, branch: master`, `require_approval: forks` only), which terragrunt-applies changed stacks. `master` is therefore **branch-protected on Forgejo** (push + merge whitelists = `viktor`, deploy keys allowed): non-admins contribute via `<user>/<topic>` branches + PRs, and only an admin merge lands (and thus applies) their change. **Clones stay fresh automatically** (2026-06-10): the hourly `t3-provision-users` reconcile runs `refresh_locked_clone` (fetch all remotes + fast-forward `master`, ONLY when on master with a clean tree and an upstream — dirty trees and local commits are left alone with a WARN), and `start-claude.sh` does the same freshen at session launch (15s-capped so an offline remote never stalls the session).
**Contribute access (per non-admin, manual — the anca/tripit PAT precedent):**
1. Add their Forgejo user as a **write** collaborator on `viktor/infra` (`PUT /api/v1/repos/viktor/infra/collaborators/<login>`).
2. Mint a PAT — the admin REST endpoint 404s here, use the in-pod CLI: `kubectl -n forgejo exec deploy/forgejo -- su -s /bin/sh git -c "forgejo admin user generate-access-token --username <login> --token-name devvm-infra-git --scopes 'write:repository'"`.
3. Install it in their `~/.git-credentials` (`https://<login>:<token>@forgejo.viktorbarzin.me`, mode 600) + `git config --global credential.helper store`, set `user.name`/`user.email`.
4. In their clone: `git remote add forgejo https://forgejo.viktorbarzin.me/viktor/infra.git` and `git branch --set-upstream-to=forgejo/master master` (origin stays the anonymous GitHub mirror).
5. Verify: branch push succeeds; a push to `master` is rejected with `Not allowed to push to protected branch`.
**Status (2026-06-10):** built + verified on the live host — capacity (8 GiB swap), config inheritance, roster-driven provisioner, per-user locked clone, per-user OIDC kubeconfig + the `oidc-power-user-readonly` ClusterRole + emo's `k8s_users` entry (applied + impersonation-verified), the Authentik `T3 Users` edge gate, **the emo Phase-5 cutover (own clone + launcher repoint + `code-shared` removal, completed 2026-06-10) and emo's contribute access (`ebarzin` write collaborator + PAT + protected `master`)**. Per the live `/etc/skel` design, non-admin `~/.claude/{rules,skills}` symlinks into the admin base are **kept** (they ARE the shared-base delivery mechanism — the plan's step to remove them is obsolete). **Remaining (held / future):** the offboarding apply-side (Phase 7), per-user MCP/auth injection, and roster-reconciled `T3 Users` membership. See `../runbooks/offboard-user.md` for deprovisioning.