t3: prepare to adopt 0.0.25 — version-agnostic dispatch + real pairing health-check + state backup [ci skip]
Investigated the 0.0.25 break: it is ONLY an endpoint rename
(/api/auth/bootstrap -> /api/auth/browser-session). The rest of the pairing
contract (credential payload, t3_session cookie, /api/auth/session) is
byte-identical, verified in isolated 0.0.24-vs-0.0.25 sandbox serves. So a
future pin bump is now safe + reversible (pin STAYS 0.0.24 — this is prep):
- t3-dispatch: autoPair tries /api/auth/browser-session, falls back to
/api/auth/bootstrap on 404 — one binary pairs across both versions and any
rolling-restart skew. TDD via TestAutoPairAcrossVersions (red on 0.0.25
before, green after). Built, deployed, verified live on 0.0.24 (all three
users still 302 + t3_session via the fallback).
- t3-autoupdate.sh: health-check now exercises the REAL mint->credential->cookie
handshake (was GET / -> 200, which passed the pairing-broken nightly). A bad
build now auto-rolls-back. Validated against both versions.
- t3-backup-state.{sh,service,timer}: daily online VACUUM INTO of each ~/.t3
state.sqlite (was the only copy, unbacked) -> the one-way forward schema
migration becomes a restore, not sqlite surgery. timeout-guarded.
- runbooks/t3-version-bump.md: the reversible cutover checklist.
- post-mortem #5 (health-check) DONE + #6 added; service-catalog updated.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
5ea238c707
commit
bccaa08d8e
9 changed files with 311 additions and 19 deletions
|
|
@ -106,12 +106,29 @@ first run.
|
|||
live session and all projection history were untouched. Backup:
|
||||
`/home/wizard/.t3/userdata/auth-backup-*.sql`.
|
||||
|
||||
### 5. End-to-end pairing health-check (DEFERRED)
|
||||
### 5. End-to-end pairing health-check (DONE — 2026-06-09 follow-up)
|
||||
|
||||
The smoke test should exercise mint→bootstrap→cookie, not just `GET /`. Not
|
||||
done here (the pin makes it moot for the known-good build); needed before the
|
||||
enforcer is ever pointed at a new version. A blackbox probe on the dispatch
|
||||
auto-pair (expect 302 + `t3_session`) would have alerted within minutes.
|
||||
`t3-autoupdate.sh`'s smoke test now exercises the REAL handshake — mint →
|
||||
`POST` the credential (trying `browser-session` then `bootstrap`) → require
|
||||
`200` + a `t3_session` cookie — not just `GET / → 200`. A build that renames or
|
||||
breaks the pairing API now fails the check and **auto-rolls-back**, instead of
|
||||
shipping a pairing-broken binary to everyone.
|
||||
|
||||
### 6. Version-agnostic dispatch + reversible bumps (DONE — "prepare for 0.0.25")
|
||||
|
||||
So the pin can move without another outage:
|
||||
- **`t3-dispatch` is now version-agnostic** — `autoPair` tries
|
||||
`/api/auth/browser-session` (0.0.25) and falls back to `/api/auth/bootstrap`
|
||||
(0.0.24), so one binary pairs across the rename and through rolling-restart
|
||||
skew. Covered by `TestAutoPairAcrossVersions`. Investigation confirmed the
|
||||
0.0.25 break was *only* this endpoint rename — the rest of the contract
|
||||
(credential payload, `t3_session` cookie, `/api/auth/session`) is byte-identical.
|
||||
- **`~/.t3` state is now backed up** — `t3-backup-state` (daily timer, online
|
||||
`VACUUM INTO`, timeout-guarded) snapshots each user's `state.sqlite` (previously
|
||||
the only copy, unbacked). This turns the one-way forward migration into a
|
||||
*restore*, not sqlite surgery.
|
||||
- **Cutover is a checklist** — `docs/runbooks/t3-version-bump.md` (pre-flight
|
||||
verify, pre-bump backup, enforcer install + auto-rollback, verify, restore).
|
||||
|
||||
## Lessons
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue