vault-token-renew runbook: document the self-heal behavior
All checks were successful
ci/woodpecker/push/default Pipeline was successful
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Drift guard section rewritten: admin-capable clobbers now self-heal at the nightly run (HEALED log line); weak clobbers keep the loud DRIFT failure; manual re-mint is only the weak-clobber recovery now. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
4a7b6db806
commit
d9717a53bf
2 changed files with 39 additions and 23 deletions
|
|
@ -82,33 +82,48 @@ tail -5 ~/.local/state/vault-token-renew.log # recent results
|
||||||
A healthy log line looks like:
|
A healthy log line looks like:
|
||||||
`<ts> OK renewed (dn=token-devvm-wizard ttl=2764800s)` (ttl 2764800s = 768h).
|
`<ts> OK renewed (dn=token-devvm-wizard ttl=2764800s)` (ttl 2764800s = 768h).
|
||||||
|
|
||||||
## Drift guard & recovery
|
After an OIDC login you'll instead see, at the next nightly run:
|
||||||
|
`<ts> HEALED: re-minted periodic token from foreign dn=oidc-… (revoked N stale periodic token(s))`
|
||||||
|
— that's the self-heal working as designed.
|
||||||
|
|
||||||
|
## Drift guard & self-heal
|
||||||
|
|
||||||
`~/.vault-token` is the Vault CLI's default token sink, so **any** `vault login`
|
`~/.vault-token` is the Vault CLI's default token sink, so **any** `vault login`
|
||||||
overwrites it. Two confirmed clobber vectors:
|
overwrites it. Two confirmed clobber vectors:
|
||||||
|
|
||||||
1. `vault login -method=oidc` → replaces it with a 7-day OIDC token (the renewer
|
1. `vault login -method=oidc` → replaces it with a 7-day OIDC token (the renewer
|
||||||
can't push past the OIDC role's 7-day `token_max_ttl`).
|
can't push past the OIDC role's 7-day `token_max_ttl`). The infra docs
|
||||||
|
prescribe this login before applies, so it recurs — it went unnoticed for
|
||||||
|
weeks twice (2026-06-18→26, 2026-06-29→07-03) and read as "Vault expires
|
||||||
|
weekly".
|
||||||
2. A stray `vault login -method=kubernetes` (e.g. a headless agent flow) →
|
2. A stray `vault login -method=kubernetes` (e.g. a headless agent flow) →
|
||||||
writes a read-only `kubernetes-woodpecker-default` token (can read Vault but
|
writes a read-only `kubernetes-woodpecker-default` token (can read Vault but
|
||||||
**cannot** write `secret/*`). This happened 2026-06-05 and went unnoticed for
|
**cannot** write `secret/*`). Happened 2026-06-05, unnoticed for two days.
|
||||||
two days — reads worked, writes silently 403'd.
|
|
||||||
|
|
||||||
To stop the renewer from silently keeping a foreign token alive, it runs a
|
Since 2026-07-03 the renewer **self-heals**
|
||||||
**drift guard** first: it refuses to renew unless the token is
|
(`docs/plans/2026-07-03-vault-token-self-heal-design.md`). On a foreign token
|
||||||
`token-devvm-wizard` **and** carries `vault-admin`. On drift it logs loudly and
|
it attempts the re-mint **with the clobbering token's own authority** and lets
|
||||||
exits non-zero (the systemd unit goes `failed`) rather than renewing someone
|
Vault's authz decide:
|
||||||
else's token. Symptom in the log:
|
|
||||||
|
|
||||||
`<ts> DRIFT: ~/.vault-token is dn=... policies=... Refusing to renew a foreign token. Re-mint: ...`
|
- **Admin-capable clobber (OIDC login)** → re-mints the periodic token,
|
||||||
|
sanity-checks it against the drift guard, atomically replaces
|
||||||
|
`~/.vault-token`, revokes stale `token-devvm-wizard` leftovers
|
||||||
|
(anti-sprawl), logs
|
||||||
|
`HEALED: re-minted periodic token from foreign dn=… (revoked N stale periodic token(s))`
|
||||||
|
and exits 0. The clobbering token is NOT revoked — it may still back a live
|
||||||
|
login session; it ages out on its own.
|
||||||
|
- **Weak clobber (read-only k8s token)** → the mint is denied; logs
|
||||||
|
`DRIFT: … heal denied, foreign token lacks create authority …; investigate what wrote it`
|
||||||
|
and exits non-zero (unit `failed`). Deliberately loud: this signals a
|
||||||
|
misbehaving agent flow — exactly the 2026-06-05 case.
|
||||||
|
|
||||||
**Recovery: re-mint** (the DRIFT log line contains the exact command) — run the
|
**Manual recovery** is only needed for the weak-clobber case (the DRIFT log
|
||||||
[mint/re-mint](#mint--re-mint-the-token) block. The drift guard detects but does
|
line still contains the exact command) — run the
|
||||||
**not** auto-recover (a deliberate scope choice — version-only, no self-heal);
|
[mint/re-mint](#mint--re-mint-the-token) block.
|
||||||
recovery is the manual re-mint above.
|
|
||||||
|
|
||||||
## Tests
|
## Tests
|
||||||
|
|
||||||
`infra/scripts/test-vault-token-renew.sh` unit-tests the drift-guard decision
|
`infra/scripts/test-vault-token-renew.sh` unit-tests the drift-guard decision,
|
||||||
and the lookup-JSON parsers (including the exact 2026-06-05 woodpecker-clobber
|
the lookup-JSON parsers (including the exact 2026-06-05 woodpecker-clobber
|
||||||
case). Run: `bash infra/scripts/test-vault-token-renew.sh`.
|
case), and the self-heal's revoke filter (which stale periodic tokens a heal
|
||||||
|
may sweep). Run: `bash infra/scripts/test-vault-token-renew.sh`.
|
||||||
|
|
|
||||||
|
|
@ -1,10 +1,11 @@
|
||||||
#!/usr/bin/env bash
|
#!/usr/bin/env bash
|
||||||
# Unit tests for the pure drift-guard functions in vault-token-renew.sh.
|
# Unit tests for the pure functions in vault-token-renew.sh.
|
||||||
# Sources the script (vtr_main is guarded) and exercises the decision logic that
|
# Sources the script (vtr_main is guarded) and exercises (a) the drift-guard
|
||||||
# decides whether ~/.vault-token is OUR periodic admin token (renew) or a foreign
|
# decision — is ~/.vault-token OUR periodic admin token (renew) or a foreign
|
||||||
# token that clobbered the file (refuse, fail loud). This is exactly the logic
|
# clobber (heal / fail loud)? — whose ABSENCE let the 2026-06-05 woodpecker
|
||||||
# whose ABSENCE let the 2026-06-05 woodpecker-token clobber be silently renewed
|
# clobber be silently renewed for two days, and (b) the self-heal's revoke
|
||||||
# for two days. Run: bash infra/scripts/test-vault-token-renew.sh
|
# filter — which stale token-devvm-wizard tokens a heal may sweep.
|
||||||
|
# Run: bash infra/scripts/test-vault-token-renew.sh
|
||||||
set -uo pipefail
|
set -uo pipefail
|
||||||
DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
# shellcheck source=/dev/null
|
# shellcheck source=/dev/null
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue