workstation: per-user long-lived Claude token to end concurrent-refresh logout
All checks were successful
ci/woodpecker/push/default Pipeline was successful
All checks were successful
ci/woodpecker/push/default Pipeline was successful
A heavy user (emo) runs 8+ always-on `claude` agents + their t3-serve instance, all sharing one ~/.claude/.credentials.json. When the shared access token expires the processes refresh simultaneously; OAuth refresh-token rotation makes the losing writer persist an EMPTY refresh token, logging the user out roughly every access-token lifetime (~8h). Re-issuing the credential never sticks — the race recurs (this is why emo's "standalone token" fix kept regressing). Fix: an opt-in, per-user, non-rotating setup-token (sk-ant-oat01, ~1y, scope user:inference) kept in the user's OWN Vault path (field `setup_token`). claude-auth-sync materializes it to a user-owned ~/.config/claude-auth-sync/claude-oauth.env and, while it is present, SKIPS the rotating-credential validate/backup/restore (so no false WorkstationClaudeAuthInvalid). start-claude.sh and t3-serve@.service load it as CLAUDE_CODE_OAUTH_TOKEN, so every session of that user uses the non-rotating token and there is nothing to race on. Fail-safe + opt-in: with no `setup_token` in Vault, every path is a no-op, so users on the normal per-user Enterprise-SSO flow are unaffected. This is each user's OWN identity, never the forbidden shared CLAUDE_CODE_OAUTH_TOKEN. Runbook documents enable/disable/rotate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
3cc8f9f661
commit
c70810a51b
4 changed files with 117 additions and 2 deletions
|
|
@ -80,8 +80,64 @@ sudo --preserve-env=VAULT_ADDR,VAULT_TOKEN /usr/local/bin/t3-provision-users
|
|||
```
|
||||
|
||||
Never copy another user's `.credentials.json` or scoped Vault token. Never restore
|
||||
the old shared `CLAUDE_CODE_OAUTH_TOKEN`; environment credentials outrank per-user
|
||||
login and would silently collapse all users onto one identity.
|
||||
a **shared** `CLAUDE_CODE_OAUTH_TOKEN` across users; environment credentials
|
||||
outrank per-user login and would silently collapse all users onto one identity.
|
||||
(A **per-user**, non-rotating setup-token tied to the user's OWN Enterprise
|
||||
identity is a different, sanctioned thing — see "Long-lived per-user token" below.)
|
||||
|
||||
## Long-lived per-user token (heavy concurrent-agent users)
|
||||
|
||||
The six-hourly renewal above assumes Claude owns refresh-token rotation in a
|
||||
single `~/.claude/.credentials.json`. A user who runs **many concurrent Claude
|
||||
sessions** (interactive tmux panes + their `t3-serve` instance + always-on
|
||||
`start-claude.sh` agents) breaks that assumption: when the shared access token
|
||||
expires, the processes refresh **simultaneously**, the OAuth server rotates the
|
||||
refresh token, and the losing writer persists an **empty** refresh token —
|
||||
logging the user out roughly every access-token lifetime (~8h). Re-issuing the
|
||||
credential does not help; the race recurs.
|
||||
|
||||
The fix is a **per-user, long-lived setup-token** (`sk-ant-oat01-…`, ~1y,
|
||||
**non-rotating**). With `CLAUDE_CODE_OAUTH_TOKEN` set, Claude uses it directly and
|
||||
never touches `.credentials.json` — so there is nothing to race on. This is the
|
||||
user's OWN Enterprise identity (scope `user:inference`; local MCP servers are
|
||||
client-side and unaffected), stored only in their OWN Vault path — **NOT** the
|
||||
forbidden shared token, and it never crosses OS users.
|
||||
|
||||
**Enable it (one-time, per user):**
|
||||
|
||||
1. The user mints their own token (interactive Enterprise SSO):
|
||||
|
||||
```bash
|
||||
claude setup-token # opens an SSO URL; paste the code back -> prints sk-ant-oat01-…
|
||||
```
|
||||
|
||||
2. An admin stores it in that user's Vault path (MERGE, never `kv put` — siblings
|
||||
like `claude_ai_oauth_json` / `vaultwarden_*` must survive):
|
||||
|
||||
```bash
|
||||
vault kv patch -method=rw secret/workstation/claude-users/<os-user> \
|
||||
setup_token=sk-ant-oat01-…
|
||||
```
|
||||
|
||||
3. Materialize + activate (or just wait ≤6h for the timer):
|
||||
|
||||
```bash
|
||||
systemctl start claude-auth-sync@<os-user>.service
|
||||
```
|
||||
|
||||
`claude-auth-sync` writes `~/.config/claude-auth-sync/claude-oauth.env`
|
||||
(`CLAUDE_CODE_OAUTH_TOKEN=…`, mode 0600) and, while a token is present, **skips**
|
||||
the rotating-credential validate/backup/restore (so no false
|
||||
`WorkstationClaudeAuthInvalid`). `start-claude.sh` and `t3-serve@.service` load
|
||||
that env file. **Sessions started before activation keep the old credential
|
||||
until relaunched** — the user must restart their agents / `t3-serve` to cut over.
|
||||
|
||||
**Disable it:** clear the field (`vault kv patch -method=rw
|
||||
secret/workstation/claude-users/<os-user> setup_token=""`) — the next sync removes
|
||||
the env file and the user reverts to the per-user SSO credential flow.
|
||||
|
||||
**Rotate before expiry:** setup-tokens expire 1y after mint. Re-mint (step 1) and
|
||||
re-store (step 2); the env file refreshes on the next sync.
|
||||
|
||||
## Verification
|
||||
|
||||
|
|
|
|||
|
|
@ -11,6 +11,12 @@ Environment=HOME=/home/%i
|
|||
Environment=PATH=/usr/local/bin:/usr/bin:/bin:/home/%i/.local/bin
|
||||
Environment=NODE_ENV=production
|
||||
EnvironmentFile=/etc/t3-serve/%i.env
|
||||
# Optional per-user long-lived CLAUDE_CODE_OAUTH_TOKEN, materialized by
|
||||
# claude-auth-sync from the user's own Vault path. Non-rotating, so t3's
|
||||
# concurrent agent sessions can't race on OAuth refresh-token rotation and wipe
|
||||
# the shared ~/.claude/.credentials.json. Leading '-' = optional (absent for
|
||||
# users on the normal per-user Enterprise-SSO credential flow).
|
||||
EnvironmentFile=-/home/%i/.config/claude-auth-sync/claude-oauth.env
|
||||
WorkingDirectory=/home/%i
|
||||
ExecStart=/usr/bin/t3 serve --host 0.0.0.0 --port ${T3_PORT} --base-dir /home/%i/.t3
|
||||
Restart=on-failure
|
||||
|
|
|
|||
|
|
@ -13,6 +13,10 @@ CAS_VAULT_TOKEN_FILE="${CLAUDE_AUTH_VAULT_TOKEN_FILE:-$CAS_CONFIG_DIR/vault-toke
|
|||
CAS_VAULT_PATH="${CLAUDE_AUTH_VAULT_PATH:-secret/workstation/claude-users/$CAS_USER}"
|
||||
CAS_STATE_DIR="${CLAUDE_AUTH_STATE_DIR:-$CAS_HOME/.local/state/claude-auth-sync}"
|
||||
CAS_LOG="$CAS_STATE_DIR/sync.log"
|
||||
# Where a long-lived per-user setup-token is materialized as an env file
|
||||
# (KEY=VALUE) for start-claude.sh + t3-serve@.service to load. Lives under the
|
||||
# already-ReadWritePaths config dir so the sandboxed service may write it.
|
||||
CAS_TOKEN_ENV_FILE="${CLAUDE_AUTH_TOKEN_ENV_FILE:-$CAS_CONFIG_DIR/claude-oauth.env}"
|
||||
|
||||
cas_log() {
|
||||
mkdir -p "$CAS_STATE_DIR"
|
||||
|
|
@ -133,6 +137,41 @@ cas_restore() {
|
|||
cas_log "RECOVERED restored Claude OAuth state from Vault"
|
||||
}
|
||||
|
||||
# A user-scoped, long-lived setup-token (`sk-ant-oat01-…`, ~1y, NON-rotating) may
|
||||
# be stored in this user's OWN Vault path (field `setup_token`). When present it
|
||||
# is the authoritative credential: it bypasses the shared
|
||||
# ~/.claude/.credentials.json OAuth refresh-token rotation entirely — the fix for
|
||||
# users running many concurrent Claude sessions (interactive + t3-serve + always-on
|
||||
# agents) that otherwise race on refresh and wipe each other's refresh token.
|
||||
# We materialize it to a user-owned env file that start-claude.sh and
|
||||
# t3-serve@.service load as CLAUDE_CODE_OAUTH_TOKEN. This is the user's OWN
|
||||
# Enterprise identity, NOT the forbidden legacy SHARED token — it never crosses
|
||||
# OS users. Returns 0 when a token is active, so the caller skips the
|
||||
# rotating-credential validate/backup/restore (probing the now-vestigial
|
||||
# credential would otherwise emit false WorkstationClaudeAuthInvalid alerts).
|
||||
cas_sync_setup_token() {
|
||||
local token desired tmp
|
||||
token="$(vault kv get -field=setup_token "$CAS_VAULT_PATH" 2>/dev/null)" || token=""
|
||||
if [[ "$token" != sk-ant-oat01-* ]]; then
|
||||
if [[ -e "$CAS_TOKEN_ENV_FILE" ]]; then
|
||||
rm -f "$CAS_TOKEN_ENV_FILE"
|
||||
cas_log "removed stale CLAUDE_CODE_OAUTH_TOKEN env (no setup-token in Vault)"
|
||||
fi
|
||||
return 1
|
||||
fi
|
||||
desired="CLAUDE_CODE_OAUTH_TOKEN=$token"
|
||||
if [[ -r "$CAS_TOKEN_ENV_FILE" && "$(<"$CAS_TOKEN_ENV_FILE")" == "$desired" ]]; then
|
||||
cas_log "OK long-lived setup-token active (CLAUDE_CODE_OAUTH_TOKEN current); credential checks skipped"
|
||||
return 0
|
||||
fi
|
||||
tmp="$(mktemp "${CAS_TOKEN_ENV_FILE}.XXXXXX")" || { cas_log "FAIL could not stage token env file"; return 1; }
|
||||
printf '%s\n' "$desired" > "$tmp"
|
||||
chmod 0600 "$tmp"
|
||||
mv "$tmp" "$CAS_TOKEN_ENV_FILE"
|
||||
cas_log "OK long-lived setup-token active; CLAUDE_CODE_OAUTH_TOKEN materialized; credential checks skipped"
|
||||
return 0
|
||||
}
|
||||
|
||||
cas_main() {
|
||||
umask 077
|
||||
for bin in jq vault claude timeout flock; do
|
||||
|
|
@ -143,6 +182,11 @@ cas_main() {
|
|||
flock -n 9 || { cas_log "SKIP another sync is already running"; return 0; }
|
||||
|
||||
cas_prepare_vault || return 1
|
||||
# A long-lived per-user setup-token, if provisioned, is authoritative and
|
||||
# non-rotating — materialize it and skip the rotating-credential dance.
|
||||
if cas_sync_setup_token; then
|
||||
return 0
|
||||
fi
|
||||
if cas_live_auth_ok; then
|
||||
cas_backup
|
||||
return
|
||||
|
|
|
|||
|
|
@ -93,6 +93,15 @@ ensure_onboarding() {
|
|||
}
|
||||
ensure_onboarding
|
||||
|
||||
# Load a per-user long-lived CLAUDE_CODE_OAUTH_TOKEN if claude-auth-sync has
|
||||
# materialized one from this user's own Vault path. A non-rotating setup-token
|
||||
# sidesteps the shared ~/.claude/.credentials.json OAuth refresh-token race that
|
||||
# logs out users running many concurrent agents (interactive + t3 + always-on).
|
||||
# Absent file -> no-op (normal per-user Enterprise-SSO flow). The user's OWN
|
||||
# token; never shared between OS users.
|
||||
_oauth_env="$HOME/.config/claude-auth-sync/claude-oauth.env"
|
||||
if [ -r "$_oauth_env" ]; then set -a; . "$_oauth_env"; set +a; fi
|
||||
|
||||
# Deliberately not `exec` so we can branch on the exit code: clean quit ends the
|
||||
# pane (ttyd closes the terminal); a crash drops to a shell so the tmux session
|
||||
# isn't destroyed-and-recreated in a ttyd auto-reconnect loop.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue