2026-06-09 08:45:33 +00:00
|
|
|
#!/usr/bin/env bash
|
2026-06-09 16:08:44 +00:00
|
|
|
# Enforce the PINNED t3 version ($T3_PIN) across the box — NOT "latest/nightly".
|
|
|
|
|
# t3 is pre-1.0 and ships breaking schema-migration + bootstrap-API changes between
|
|
|
|
|
# builds that our t3-dispatch can't follow blind. 2026-06-09: a nightly auto-update
|
|
|
|
|
# (0.0.25) migrated every ~/.t3 state.sqlite forward (auth_pairing_links/auth_sessions
|
|
|
|
|
# role->scopes) AND changed the bootstrap API, breaking mint/pairing for ALL users.
|
|
|
|
|
# So we PIN; this unit just re-asserts the pin (a no-op when already correct) with a
|
|
|
|
|
# health-check + auto-rollback and idle-only restarts (never kill an in-flight session).
|
|
|
|
|
# To move the pin: bump T3_PIN AND first verify t3-dispatch's bootstrap flow against the
|
|
|
|
|
# new build (curl the dispatch -> expect 302 + Set-Cookie t3_session). See post-mortem
|
|
|
|
|
# 2026-06-09-t3-nightly-autoupdate-auth-outage.md.
|
t3: prepare to adopt 0.0.25 — version-agnostic dispatch + real pairing health-check + state backup [ci skip]
Investigated the 0.0.25 break: it is ONLY an endpoint rename
(/api/auth/bootstrap -> /api/auth/browser-session). The rest of the pairing
contract (credential payload, t3_session cookie, /api/auth/session) is
byte-identical, verified in isolated 0.0.24-vs-0.0.25 sandbox serves. So a
future pin bump is now safe + reversible (pin STAYS 0.0.24 — this is prep):
- t3-dispatch: autoPair tries /api/auth/browser-session, falls back to
/api/auth/bootstrap on 404 — one binary pairs across both versions and any
rolling-restart skew. TDD via TestAutoPairAcrossVersions (red on 0.0.25
before, green after). Built, deployed, verified live on 0.0.24 (all three
users still 302 + t3_session via the fallback).
- t3-autoupdate.sh: health-check now exercises the REAL mint->credential->cookie
handshake (was GET / -> 200, which passed the pairing-broken nightly). A bad
build now auto-rolls-back. Validated against both versions.
- t3-backup-state.{sh,service,timer}: daily online VACUUM INTO of each ~/.t3
state.sqlite (was the only copy, unbacked) -> the one-way forward schema
migration becomes a restore, not sqlite surgery. timeout-guarded.
- runbooks/t3-version-bump.md: the reversible cutover checklist.
- post-mortem #5 (health-check) DONE + #6 added; service-catalog updated.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 20:00:11 +00:00
|
|
|
# The health-check below exercises the REAL pairing handshake (mint -> credential
|
|
|
|
|
# exchange -> t3_session cookie), mirroring t3-dispatch's endpoint fallback — so a
|
|
|
|
|
# build that renames or breaks the pairing API fails the check and auto-rolls-back
|
|
|
|
|
# (closes the 2026-06-09 miss, where a GET / probe passed a pairing-broken build).
|
2026-06-09 08:45:33 +00:00
|
|
|
set -uo pipefail
|
2026-06-09 20:55:47 +00:00
|
|
|
T3_PIN="${T3_PIN:-0.0.26}" # known-good, t3-dispatch-compatible (2026-06-09 post-mortem)
|
2026-06-09 08:45:33 +00:00
|
|
|
LOG() { logger -t t3-autoupdate "$*"; echo "t3-autoupdate: $*"; }
|
|
|
|
|
|
|
|
|
|
ver() { t3 --version 2>/dev/null | awk '{print $NF}' | sed 's/^v//'; }
|
|
|
|
|
|
2026-06-09 16:08:44 +00:00
|
|
|
before=$(ver); LOG "current: ${before:-unknown}; pin: $T3_PIN"
|
|
|
|
|
npm i -g "t3@$T3_PIN" >/dev/null 2>&1 || { LOG "npm install failed; staying on ${before:-current}"; exit 0; }
|
2026-06-09 08:45:33 +00:00
|
|
|
after=$(ver)
|
|
|
|
|
|
|
|
|
|
if [[ -z "$after" || "$after" == "$before" ]]; then
|
2026-06-09 16:08:44 +00:00
|
|
|
LOG "already at pin $T3_PIN (${before:-?}); nothing to do"; exit 0
|
2026-06-09 08:45:33 +00:00
|
|
|
fi
|
2026-06-09 16:08:44 +00:00
|
|
|
LOG "re-pinned to $after (was $before); health-checking…"
|
2026-06-09 08:45:33 +00:00
|
|
|
|
|
|
|
|
# Health-check the NEW binary on a throwaway port/base-dir before trusting it.
|
t3: prepare to adopt 0.0.25 — version-agnostic dispatch + real pairing health-check + state backup [ci skip]
Investigated the 0.0.25 break: it is ONLY an endpoint rename
(/api/auth/bootstrap -> /api/auth/browser-session). The rest of the pairing
contract (credential payload, t3_session cookie, /api/auth/session) is
byte-identical, verified in isolated 0.0.24-vs-0.0.25 sandbox serves. So a
future pin bump is now safe + reversible (pin STAYS 0.0.24 — this is prep):
- t3-dispatch: autoPair tries /api/auth/browser-session, falls back to
/api/auth/bootstrap on 404 — one binary pairs across both versions and any
rolling-restart skew. TDD via TestAutoPairAcrossVersions (red on 0.0.25
before, green after). Built, deployed, verified live on 0.0.24 (all three
users still 302 + t3_session via the fallback).
- t3-autoupdate.sh: health-check now exercises the REAL mint->credential->cookie
handshake (was GET / -> 200, which passed the pairing-broken nightly). A bad
build now auto-rolls-back. Validated against both versions.
- t3-backup-state.{sh,service,timer}: daily online VACUUM INTO of each ~/.t3
state.sqlite (was the only copy, unbacked) -> the one-way forward schema
migration becomes a restore, not sqlite surgery. timeout-guarded.
- runbooks/t3-version-bump.md: the reversible cutover checklist.
- post-mortem #5 (health-check) DONE + #6 added; service-catalog updated.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 20:00:11 +00:00
|
|
|
# Gate 1 = liveness (GET / -> 200); Gate 2 = the REAL pairing handshake t3-dispatch
|
|
|
|
|
# performs (mint -> POST credential -> 200 + t3_session cookie), trying the same
|
|
|
|
|
# endpoint fallback. Gate 2 catches a bootstrap-API rename / pairing regression.
|
2026-06-09 08:45:33 +00:00
|
|
|
SMOKE_PORT=3799; SMOKE_DIR=$(mktemp -d)
|
|
|
|
|
t3 serve --host 127.0.0.1 --port "$SMOKE_PORT" --base-dir "$SMOKE_DIR" >/dev/null 2>&1 &
|
t3: prepare to adopt 0.0.25 — version-agnostic dispatch + real pairing health-check + state backup [ci skip]
Investigated the 0.0.25 break: it is ONLY an endpoint rename
(/api/auth/bootstrap -> /api/auth/browser-session). The rest of the pairing
contract (credential payload, t3_session cookie, /api/auth/session) is
byte-identical, verified in isolated 0.0.24-vs-0.0.25 sandbox serves. So a
future pin bump is now safe + reversible (pin STAYS 0.0.24 — this is prep):
- t3-dispatch: autoPair tries /api/auth/browser-session, falls back to
/api/auth/bootstrap on 404 — one binary pairs across both versions and any
rolling-restart skew. TDD via TestAutoPairAcrossVersions (red on 0.0.25
before, green after). Built, deployed, verified live on 0.0.24 (all three
users still 302 + t3_session via the fallback).
- t3-autoupdate.sh: health-check now exercises the REAL mint->credential->cookie
handshake (was GET / -> 200, which passed the pairing-broken nightly). A bad
build now auto-rolls-back. Validated against both versions.
- t3-backup-state.{sh,service,timer}: daily online VACUUM INTO of each ~/.t3
state.sqlite (was the only copy, unbacked) -> the one-way forward schema
migration becomes a restore, not sqlite surgery. timeout-guarded.
- runbooks/t3-version-bump.md: the reversible cutover checklist.
- post-mortem #5 (health-check) DONE + #6 added; service-catalog updated.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 20:00:11 +00:00
|
|
|
smoke=$!; live=0; pair_ok=0
|
2026-06-09 08:45:33 +00:00
|
|
|
for _ in $(seq 1 15); do
|
t3: prepare to adopt 0.0.25 — version-agnostic dispatch + real pairing health-check + state backup [ci skip]
Investigated the 0.0.25 break: it is ONLY an endpoint rename
(/api/auth/bootstrap -> /api/auth/browser-session). The rest of the pairing
contract (credential payload, t3_session cookie, /api/auth/session) is
byte-identical, verified in isolated 0.0.24-vs-0.0.25 sandbox serves. So a
future pin bump is now safe + reversible (pin STAYS 0.0.24 — this is prep):
- t3-dispatch: autoPair tries /api/auth/browser-session, falls back to
/api/auth/bootstrap on 404 — one binary pairs across both versions and any
rolling-restart skew. TDD via TestAutoPairAcrossVersions (red on 0.0.25
before, green after). Built, deployed, verified live on 0.0.24 (all three
users still 302 + t3_session via the fallback).
- t3-autoupdate.sh: health-check now exercises the REAL mint->credential->cookie
handshake (was GET / -> 200, which passed the pairing-broken nightly). A bad
build now auto-rolls-back. Validated against both versions.
- t3-backup-state.{sh,service,timer}: daily online VACUUM INTO of each ~/.t3
state.sqlite (was the only copy, unbacked) -> the one-way forward schema
migration becomes a restore, not sqlite surgery. timeout-guarded.
- runbooks/t3-version-bump.md: the reversible cutover checklist.
- post-mortem #5 (health-check) DONE + #6 added; service-catalog updated.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 20:00:11 +00:00
|
|
|
[[ "$(curl -s -o /dev/null -w '%{http_code}' --max-time 5 "http://127.0.0.1:$SMOKE_PORT/" 2>/dev/null)" == "200" ]] && { live=1; break; }
|
2026-06-09 08:45:33 +00:00
|
|
|
sleep 2
|
|
|
|
|
done
|
t3: prepare to adopt 0.0.25 — version-agnostic dispatch + real pairing health-check + state backup [ci skip]
Investigated the 0.0.25 break: it is ONLY an endpoint rename
(/api/auth/bootstrap -> /api/auth/browser-session). The rest of the pairing
contract (credential payload, t3_session cookie, /api/auth/session) is
byte-identical, verified in isolated 0.0.24-vs-0.0.25 sandbox serves. So a
future pin bump is now safe + reversible (pin STAYS 0.0.24 — this is prep):
- t3-dispatch: autoPair tries /api/auth/browser-session, falls back to
/api/auth/bootstrap on 404 — one binary pairs across both versions and any
rolling-restart skew. TDD via TestAutoPairAcrossVersions (red on 0.0.25
before, green after). Built, deployed, verified live on 0.0.24 (all three
users still 302 + t3_session via the fallback).
- t3-autoupdate.sh: health-check now exercises the REAL mint->credential->cookie
handshake (was GET / -> 200, which passed the pairing-broken nightly). A bad
build now auto-rolls-back. Validated against both versions.
- t3-backup-state.{sh,service,timer}: daily online VACUUM INTO of each ~/.t3
state.sqlite (was the only copy, unbacked) -> the one-way forward schema
migration becomes a restore, not sqlite surgery. timeout-guarded.
- runbooks/t3-version-bump.md: the reversible cutover checklist.
- post-mortem #5 (health-check) DONE + #6 added; service-catalog updated.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 20:00:11 +00:00
|
|
|
if [[ "$live" == "1" ]]; then
|
|
|
|
|
cred=$(t3 auth pairing create --base-dir "$SMOKE_DIR" --ttl 5m --json 2>/dev/null \
|
|
|
|
|
| tr -d '\n ' | sed -n 's/.*"credential":"\([^"]*\)".*/\1/p')
|
|
|
|
|
if [[ -n "$cred" ]]; then
|
|
|
|
|
for ep in /api/auth/browser-session /api/auth/bootstrap; do # mirror t3-dispatch's fallback
|
|
|
|
|
hdr=$(curl -s -i --max-time 5 -X POST -H 'Content-Type: application/json' \
|
|
|
|
|
-d "{\"credential\":\"$cred\"}" "http://127.0.0.1:$SMOKE_PORT$ep" 2>/dev/null)
|
|
|
|
|
code=$(printf '%s' "$hdr" | sed -n '1s#.* \([0-9][0-9][0-9]\).*#\1#p')
|
|
|
|
|
[[ "$code" == "404" ]] && continue # endpoint absent in this build — try the next
|
|
|
|
|
printf '%s' "$hdr" | grep -qi '^set-cookie:[[:space:]]*t3_session=' && pair_ok=1
|
|
|
|
|
break
|
|
|
|
|
done
|
|
|
|
|
fi
|
|
|
|
|
fi
|
2026-06-09 08:45:33 +00:00
|
|
|
kill "$smoke" 2>/dev/null; wait "$smoke" 2>/dev/null; rm -rf "$SMOKE_DIR"
|
|
|
|
|
|
t3: prepare to adopt 0.0.25 — version-agnostic dispatch + real pairing health-check + state backup [ci skip]
Investigated the 0.0.25 break: it is ONLY an endpoint rename
(/api/auth/bootstrap -> /api/auth/browser-session). The rest of the pairing
contract (credential payload, t3_session cookie, /api/auth/session) is
byte-identical, verified in isolated 0.0.24-vs-0.0.25 sandbox serves. So a
future pin bump is now safe + reversible (pin STAYS 0.0.24 — this is prep):
- t3-dispatch: autoPair tries /api/auth/browser-session, falls back to
/api/auth/bootstrap on 404 — one binary pairs across both versions and any
rolling-restart skew. TDD via TestAutoPairAcrossVersions (red on 0.0.25
before, green after). Built, deployed, verified live on 0.0.24 (all three
users still 302 + t3_session via the fallback).
- t3-autoupdate.sh: health-check now exercises the REAL mint->credential->cookie
handshake (was GET / -> 200, which passed the pairing-broken nightly). A bad
build now auto-rolls-back. Validated against both versions.
- t3-backup-state.{sh,service,timer}: daily online VACUUM INTO of each ~/.t3
state.sqlite (was the only copy, unbacked) -> the one-way forward schema
migration becomes a restore, not sqlite surgery. timeout-guarded.
- runbooks/t3-version-bump.md: the reversible cutover checklist.
- post-mortem #5 (health-check) DONE + #6 added; service-catalog updated.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 20:00:11 +00:00
|
|
|
if [[ "$live" != "1" || "$pair_ok" != "1" ]]; then
|
|
|
|
|
LOG "HEALTH-CHECK FAILED for $after (live=$live pair=$pair_ok) — rolling back to $before"
|
2026-06-09 08:45:33 +00:00
|
|
|
if [[ -n "$before" ]] && npm i -g "t3@$before" >/dev/null 2>&1; then
|
|
|
|
|
LOG "rolled back to $before"
|
|
|
|
|
else
|
|
|
|
|
LOG "ROLLBACK FAILED — manual fix needed (t3 may be broken)"
|
|
|
|
|
fi
|
|
|
|
|
exit 1
|
|
|
|
|
fi
|
t3: prepare to adopt 0.0.25 — version-agnostic dispatch + real pairing health-check + state backup [ci skip]
Investigated the 0.0.25 break: it is ONLY an endpoint rename
(/api/auth/bootstrap -> /api/auth/browser-session). The rest of the pairing
contract (credential payload, t3_session cookie, /api/auth/session) is
byte-identical, verified in isolated 0.0.24-vs-0.0.25 sandbox serves. So a
future pin bump is now safe + reversible (pin STAYS 0.0.24 — this is prep):
- t3-dispatch: autoPair tries /api/auth/browser-session, falls back to
/api/auth/bootstrap on 404 — one binary pairs across both versions and any
rolling-restart skew. TDD via TestAutoPairAcrossVersions (red on 0.0.25
before, green after). Built, deployed, verified live on 0.0.24 (all three
users still 302 + t3_session via the fallback).
- t3-autoupdate.sh: health-check now exercises the REAL mint->credential->cookie
handshake (was GET / -> 200, which passed the pairing-broken nightly). A bad
build now auto-rolls-back. Validated against both versions.
- t3-backup-state.{sh,service,timer}: daily online VACUUM INTO of each ~/.t3
state.sqlite (was the only copy, unbacked) -> the one-way forward schema
migration becomes a restore, not sqlite surgery. timeout-guarded.
- runbooks/t3-version-bump.md: the reversible cutover checklist.
- post-mortem #5 (health-check) DONE + #6 added; service-catalog updated.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 20:00:11 +00:00
|
|
|
LOG "health OK (live + pairing handshake); restarting idle instances"
|
2026-06-09 08:45:33 +00:00
|
|
|
|
|
|
|
|
# Restart only IDLE per-user instances; defer any with an active agent child.
|
|
|
|
|
for unit in $(systemctl list-units --type=service --state=running --no-legend 't3-serve@*' | awk '{print $1}'); do
|
|
|
|
|
pid=$(systemctl show -p MainPID --value "$unit")
|
|
|
|
|
if [[ -n "$pid" && "$pid" != 0 ]] && pgrep -aP "$pid" 2>/dev/null | grep -qiE 'claude|codex|opencode'; then
|
|
|
|
|
LOG "deferring $unit (active agent) — updates next cycle when idle"
|
|
|
|
|
else
|
|
|
|
|
systemctl restart "$unit" && LOG "restarted $unit -> $after"
|
|
|
|
|
fi
|
|
|
|
|
done
|
|
|
|
|
LOG "update complete: $after"
|