t3: pin t3@0.0.24 + stop nightly auto-update (auth-outage fix) [ci skip]
The t3-autoupdate timer (re-enabled by the provisioner's step 5b with `--now`, which fires the missed daily job immediately on a Persistent timer) pulled t3@nightly 0.0.25 mid-day. That build ran forward schema migrations on every ~/.t3 state.sqlite (auth_pairing_links/auth_sessions role->scopes, +proof_key_thumbprint) AND changed the bootstrap API, breaking t3-mint/pairing for ALL devvm users (pair prompt, no session). - t3-autoupdate.sh: now a pinned-version ENFORCER (T3_PIN=0.0.24), not a nightly tracker -- re-asserts the pin (a no-op when correct). - t3-provision-users.sh step 5b: drop `--now` (it triggered the immediate missed-job run that pulled the bad build). - setup-devvm.sh: install pinned t3@0.0.24 at machine setup. - unit Descriptions + service-catalog reflect the pin. - post-mortem: 2026-06-09-t3-nightly-autoupdate-auth-outage.md. Host already reconciled out-of-band: rolled back to 0.0.24, re-enabled the (now-pinned) enforcer, reset the 2 new users' disposable DBs, surgically reverted wizard's auth tables to level-30 (96 threads + live session preserved). All users verified 302 + t3_session. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
2125651aaa
commit
5ea238c707
7 changed files with 174 additions and 13 deletions
|
|
@ -1,5 +1,5 @@
|
|||
[Unit]
|
||||
Description=Track latest t3 nightly (health-checked, idle-only restart)
|
||||
Description=Enforce pinned t3 version (health-checked, idle-only restart)
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
|
||||
|
|
|
|||
|
|
@ -1,20 +1,30 @@
|
|||
#!/usr/bin/env bash
|
||||
# Track the latest t3 nightly — with a health-check + auto-rollback (lesson from
|
||||
# the Keel auto-update incidents: never blindly trust a new build) and idle-only
|
||||
# restarts (never kill an in-flight coding session). Runs as root via the unit.
|
||||
# Enforce the PINNED t3 version ($T3_PIN) across the box — NOT "latest/nightly".
|
||||
# t3 is pre-1.0 and ships breaking schema-migration + bootstrap-API changes between
|
||||
# builds that our t3-dispatch can't follow blind. 2026-06-09: a nightly auto-update
|
||||
# (0.0.25) migrated every ~/.t3 state.sqlite forward (auth_pairing_links/auth_sessions
|
||||
# role->scopes) AND changed the bootstrap API, breaking mint/pairing for ALL users.
|
||||
# So we PIN; this unit just re-asserts the pin (a no-op when already correct) with a
|
||||
# health-check + auto-rollback and idle-only restarts (never kill an in-flight session).
|
||||
# To move the pin: bump T3_PIN AND first verify t3-dispatch's bootstrap flow against the
|
||||
# new build (curl the dispatch -> expect 302 + Set-Cookie t3_session). See post-mortem
|
||||
# 2026-06-09-t3-nightly-autoupdate-auth-outage.md.
|
||||
# CAVEAT: the health-check below only probes GET / (200) — it does NOT exercise the
|
||||
# mint/bootstrap/pairing path, so it will NOT catch an auth regression on its own.
|
||||
set -uo pipefail
|
||||
T3_PIN="${T3_PIN:-0.0.24}" # known-good, t3-dispatch-compatible (2026-06-09 post-mortem)
|
||||
LOG() { logger -t t3-autoupdate "$*"; echo "t3-autoupdate: $*"; }
|
||||
|
||||
ver() { t3 --version 2>/dev/null | awk '{print $NF}' | sed 's/^v//'; }
|
||||
|
||||
before=$(ver); LOG "current: ${before:-unknown}"
|
||||
npm i -g t3@nightly >/dev/null 2>&1 || { LOG "npm install failed; staying on ${before:-current}"; exit 0; }
|
||||
before=$(ver); LOG "current: ${before:-unknown}; pin: $T3_PIN"
|
||||
npm i -g "t3@$T3_PIN" >/dev/null 2>&1 || { LOG "npm install failed; staying on ${before:-current}"; exit 0; }
|
||||
after=$(ver)
|
||||
|
||||
if [[ -z "$after" || "$after" == "$before" ]]; then
|
||||
LOG "already latest (${before:-?}); nothing to do"; exit 0
|
||||
LOG "already at pin $T3_PIN (${before:-?}); nothing to do"; exit 0
|
||||
fi
|
||||
LOG "installed $after (was $before); health-checking…"
|
||||
LOG "re-pinned to $after (was $before); health-checking…"
|
||||
|
||||
# Health-check the NEW binary on a throwaway port/base-dir before trusting it.
|
||||
SMOKE_PORT=3799; SMOKE_DIR=$(mktemp -d)
|
||||
|
|
|
|||
|
|
@ -1,5 +1,5 @@
|
|||
[Unit]
|
||||
Description=Daily t3 nightly auto-update
|
||||
Description=Daily t3 pinned-version enforcer (re-asserts T3_PIN; no-op when correct)
|
||||
|
||||
[Timer]
|
||||
OnCalendar=*-*-* 04:00:00
|
||||
|
|
|
|||
|
|
@ -191,9 +191,12 @@ while IFS=$'\t' read -r os_user port; do
|
|||
id "$os_user" >/dev/null 2>&1 && run systemctl enable --now "t3-serve@$os_user.service" >/dev/null 2>&1 || true
|
||||
done < <(jq -r '.ports | to_entries[] | [.key, .value] | @tsv' "$desired_file")
|
||||
|
||||
# 5b) machine-wide (once, not per-user): keep the t3 nightly auto-updater enabled so it
|
||||
# self-heals hourly — a `disabled` timer silently freezes every instance on an old build.
|
||||
run systemctl enable --now t3-autoupdate.timer >/dev/null 2>&1 || true
|
||||
# 5b) machine-wide (once, not per-user): keep the t3 pinned-version ENFORCER enabled (it
|
||||
# re-asserts T3_PIN daily; a no-op when already correct). NOT --now: with Persistent=true
|
||||
# a `--now` enable fires the missed daily job IMMEDIATELY, which on 2026-06-09 pulled a
|
||||
# breaking nightly mid-day and took out auth for everyone. `enable` (no --now) just arms
|
||||
# the 04:00 schedule; fresh boxes get t3 from setup-devvm.sh's pinned install, not here.
|
||||
run systemctl enable t3-autoupdate.timer >/dev/null 2>&1 || true
|
||||
|
||||
# 6) regenerate /etc/ttyd-user-map + dispatch.json from the desired state (SSoT:
|
||||
# a roster entry removed here DISAPPEARS, which is what the offboarding cut relies on)
|
||||
|
|
|
|||
|
|
@ -33,6 +33,16 @@ if [[ $need_node -eq 1 ]]; then
|
|||
fi
|
||||
command -v claude >/dev/null || { log "npm: installing @anthropic-ai/claude-code"; npm install -g @anthropic-ai/claude-code >/dev/null; }
|
||||
|
||||
# 2b) t3 (the per-user coding surface) — PINNED, never nightly/latest. t3 is pre-1.0 and
|
||||
# ships breaking auth-schema + bootstrap-API changes our t3-dispatch can't follow blind
|
||||
# (2026-06-09 outage: a nightly auto-update broke pairing for ALL users). The daily
|
||||
# t3-autoupdate ENFORCER re-asserts this same pin; install it here so a fresh box has t3
|
||||
# immediately. Keep T3_PIN in sync with t3-autoupdate.sh.
|
||||
T3_PIN="${T3_PIN:-0.0.24}"
|
||||
if [[ "$(t3 --version 2>/dev/null | awk '{print $NF}' | sed 's/^v//')" != "$T3_PIN" ]]; then
|
||||
log "npm: installing pinned t3@$T3_PIN"; npm install -g "t3@$T3_PIN" >/dev/null
|
||||
fi
|
||||
|
||||
# 3) kubelogin (kubectl oidc-login) system-wide — NOT the apt 'kubelogin' (= Azure tool)
|
||||
if [[ ! -x /usr/local/bin/kubelogin ]]; then
|
||||
log "kubelogin: installing int128/kubelogin"
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue