vault-token-renew: version the devvm renewer + user units in the repo
The devvm periodic Vault admin token (token-devvm-wizard, period=768h, policies default+sops-admin+vault-admin) is kept alive by a systemd user timer, but the renewer script + units lived only under ~/.local/bin and ~/.config/systemd/user — lost on a devvm rebuild. Move them into the repo as the source of truth so a rebuild can restore them. (version-only scope: behavior unchanged; no canonical-file/self-heal added.)
- scripts/vault-token-renew.{sh,service,timer}: renewer + user units, refactored into pure drift-guard functions + a guarded main (behavior identical; deployed live and verified still renewing with full write access).
- scripts/test-vault-token-renew.sh: unit-tests the drift guard + lookup-JSON parsing, incl. the 2026-06-05 woodpecker-clobber case (17 assertions).
- docs/runbooks/vault-token-renew-devvm.md: deploy, mint/re-mint, health-check, drift recovery.
- docs/architecture/secrets.md: correct the stale '~/.vault-token = OIDC token' description for devvm.
[ci skip]
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
f9d5cd6243
commit
d4ec5768b2
6 changed files with 289 additions and 2 deletions
|
|
@ -77,7 +77,7 @@ graph LR
|
|||
- Application configuration secrets
|
||||
- Encryption keys
|
||||
|
||||
Authentication: `vault login -method=oidc` (Authentik SSO) → `~/.vault-token` → read by Vault Terraform provider.
|
||||
Authentication: `vault login -method=oidc` (Authentik SSO) → `~/.vault-token` → read by Vault Terraform provider. On `devvm`, `~/.vault-token` instead holds a long-lived **periodic** admin token auto-renewed daily by a systemd user timer (no weekly re-login) — see the [vault-token-renew-devvm runbook](../runbooks/vault-token-renew-devvm.md).
|
||||
|
||||
### External Secrets Operator (ESO)
|
||||
|
||||
|
|
@ -260,7 +260,14 @@ spec:
|
|||
|
||||
### Terraform Provider Auth
|
||||
|
||||
`~/.vault-token` created by `vault login -method=oidc`:
|
||||
The provider reads `VAULT_ADDR` from env and the token from `~/.vault-token`.
|
||||
That file is populated by `vault login -method=oidc` (humans, ad-hoc) — except
|
||||
on `devvm`, where it holds a long-lived **periodic** admin token (`display_name
|
||||
token-devvm-wizard`, `period=768h`, `explicit_max_ttl=0`, policies
|
||||
`default`+`sops-admin`+`vault-admin`) that a systemd user timer renews daily, so
|
||||
no weekly re-login is needed. A drift guard refuses to renew if a stray
|
||||
`vault login` clobbers the file with a foreign token. Deploy + recovery:
|
||||
[vault-token-renew-devvm runbook](../runbooks/vault-token-renew-devvm.md).
|
||||
|
||||
```hcl
|
||||
provider "vault" {
|
||||
|
|
|
|||
114
docs/runbooks/vault-token-renew-devvm.md
Normal file
114
docs/runbooks/vault-token-renew-devvm.md
Normal file
|
|
@ -0,0 +1,114 @@
|
|||
# Runbook: devvm Vault token auto-renewal
|
||||
|
||||
**Host:** `devvm` (10.0.10.10), user `wizard`
|
||||
**Source of truth:** `infra/scripts/vault-token-renew.{sh,service,timer}`
|
||||
**Live paths:** `~/.local/bin/vault-token-renew`, `~/.config/systemd/user/vault-token-renew.{service,timer}`
|
||||
|
||||
## What this is
|
||||
|
||||
`wizard@devvm` authenticates to Vault with a **periodic, orphan** token stored
|
||||
in `~/.vault-token`, instead of a 7-day OIDC login that needed weekly
|
||||
re-auth. A systemd **user** timer renews it daily so it never expires.
|
||||
|
||||
| Property | Value |
|
||||
|---|---|
|
||||
| `display_name` | `token-devvm-wizard` |
|
||||
| `period` | `768h` (32 days) |
|
||||
| `explicit_max_ttl` | `0` (no hard cap) |
|
||||
| `policies` | `default`, `sops-admin`, `vault-admin` |
|
||||
| `orphan` | `true` (not revoked when any parent expires) |
|
||||
|
||||
Periodic tokens have no max-TTL; they only need renewing once per `period`.
|
||||
Daily renewal leaves a 32× margin. **If devvm is decommissioned and the timer
|
||||
stops, the token self-expires within ~32 days** — deliberately, unlike a root
|
||||
token which would live forever (this is the security trade-off Viktor chose:
|
||||
periodic + renewer over a never-expiring root token).
|
||||
|
||||
## Deploy on a fresh devvm
|
||||
|
||||
The renewer is a host-side script + user systemd units, deployed manually (same
|
||||
model as the other `infra/scripts/` host scripts). From a checkout of the repo
|
||||
**as user `wizard` on devvm**:
|
||||
|
||||
```bash
|
||||
cd ~/code/infra/scripts
|
||||
install -m 0755 vault-token-renew.sh ~/.local/bin/vault-token-renew # strip .sh
|
||||
install -m 0644 vault-token-renew.service vault-token-renew.timer ~/.config/systemd/user/
|
||||
|
||||
# user manager must survive logout, so the daily timer fires headless
|
||||
loginctl enable-linger "$USER"
|
||||
|
||||
systemctl --user daemon-reload
|
||||
systemctl --user enable --now vault-token-renew.timer
|
||||
```
|
||||
|
||||
Then mint the token (one-time, interactive — see below). The script and units
|
||||
carry no secret; only the token itself is sensitive and stays out of git.
|
||||
|
||||
## Mint / re-mint the token
|
||||
|
||||
Requires an interactive OIDC login (browser), so it can't run unattended:
|
||||
|
||||
```bash
|
||||
export VAULT_ADDR=https://vault.viktorbarzin.me
|
||||
vault login -method=oidc
|
||||
vault token create -orphan -period=768h \
|
||||
-policy=vault-admin -policy=sops-admin -display-name=devvm-wizard \
|
||||
-field=token > ~/.vault-token
|
||||
chmod 600 ~/.vault-token
|
||||
```
|
||||
|
||||
Vault prefixes the display name, so it becomes `token-devvm-wizard` (which is
|
||||
what the drift guard checks for). `-orphan` is essential: a child of the 7-day
|
||||
OIDC token would be revoked when that parent expired.
|
||||
|
||||
## Health check
|
||||
|
||||
```bash
|
||||
export VAULT_ADDR=https://vault.viktorbarzin.me
|
||||
vault token lookup | grep -E 'display_name|period|explicit_max_ttl|policies'
|
||||
# expect: display_name token-devvm-wizard, period 768h, explicit_max_ttl 0s,
|
||||
# policies [default sops-admin vault-admin]
|
||||
|
||||
# authoritative write-capability check (do NOT trust the policies field alone —
|
||||
# an OIDC token shows policies=[default] but carries vault-admin via identity):
|
||||
vault token capabilities secret/data/viktor # expect create/update/.../sudo
|
||||
|
||||
# renewer health
|
||||
systemctl --user list-timers | grep vault-token-renew # next/last run
|
||||
tail -5 ~/.local/state/vault-token-renew.log # recent results
|
||||
```
|
||||
|
||||
A healthy log line looks like:
|
||||
`<ts> OK renewed (dn=token-devvm-wizard ttl=2764800s)` (ttl 2764800s = 768h).
|
||||
|
||||
## Drift guard & recovery
|
||||
|
||||
`~/.vault-token` is the Vault CLI's default token sink, so **any** `vault login`
|
||||
overwrites it. Two confirmed clobber vectors:
|
||||
|
||||
1. `vault login -method=oidc` → replaces it with a 7-day OIDC token (the renewer
|
||||
can't push past the OIDC role's 7-day `token_max_ttl`).
|
||||
2. A stray `vault login -method=kubernetes` (e.g. a headless agent flow) →
|
||||
writes a read-only `kubernetes-woodpecker-default` token (can read Vault but
|
||||
**cannot** write `secret/*`). This happened 2026-06-05 and went unnoticed for
|
||||
two days — reads worked, writes silently 403'd.
|
||||
|
||||
To stop the renewer from silently keeping a foreign token alive, it runs a
|
||||
**drift guard** first: it refuses to renew unless the token is
|
||||
`token-devvm-wizard` **and** carries `vault-admin`. On drift it logs loudly and
|
||||
exits non-zero (the systemd unit goes `failed`) rather than renewing someone
|
||||
else's token. Symptom in the log:
|
||||
|
||||
`<ts> DRIFT: ~/.vault-token is dn=... policies=... Refusing to renew a foreign token. Re-mint: ...`
|
||||
|
||||
**Recovery: re-mint** (the DRIFT log line contains the exact command) — run the
|
||||
[mint/re-mint](#mint--re-mint-the-token) block. The drift guard detects but does
|
||||
**not** auto-recover (a deliberate scope choice — version-only, no self-heal);
|
||||
recovery is the manual re-mint above.
|
||||
|
||||
## Tests
|
||||
|
||||
`infra/scripts/test-vault-token-renew.sh` unit-tests the drift-guard decision
|
||||
and the lookup-JSON parsers (including the exact 2026-06-05 woodpecker-clobber
|
||||
case). Run: `bash infra/scripts/test-vault-token-renew.sh`.
|
||||
57
scripts/test-vault-token-renew.sh
Normal file
57
scripts/test-vault-token-renew.sh
Normal file
|
|
@ -0,0 +1,57 @@
|
|||
#!/usr/bin/env bash
|
||||
# Unit tests for the pure drift-guard functions in vault-token-renew.sh.
|
||||
# Sources the script (vtr_main is guarded) and exercises the decision logic that
|
||||
# decides whether ~/.vault-token is OUR periodic admin token (renew) or a foreign
|
||||
# token that clobbered the file (refuse, fail loud). This is exactly the logic
|
||||
# whose ABSENCE let the 2026-06-05 woodpecker-token clobber be silently renewed
|
||||
# for two days. Run: bash infra/scripts/test-vault-token-renew.sh
|
||||
set -uo pipefail
|
||||
DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
# shellcheck source=/dev/null
|
||||
source "$DIR/vault-token-renew.sh"
|
||||
|
||||
pass=0 fail=0
|
||||
ok() { # <description> <cmd...> — expects the command to succeed (renew-OK)
|
||||
if "${@:2}"; then pass=$((pass + 1)); else
|
||||
fail=$((fail + 1)); printf 'FAIL: %s — expected OK, got refuse\n' "$1"
|
||||
fi
|
||||
}
|
||||
no() { # <description> <cmd...> — expects the command to fail (drift, refuse)
|
||||
if "${@:2}"; then
|
||||
fail=$((fail + 1)); printf 'FAIL: %s — expected DRIFT, got OK\n' "$1"
|
||||
else pass=$((pass + 1)); fi
|
||||
}
|
||||
eq() { # <description> <expected> <actual>
|
||||
if [[ "$2" == "$3" ]]; then pass=$((pass + 1)); else
|
||||
fail=$((fail + 1)); printf 'FAIL: %s — expected [%s] got [%s]\n' "$1" "$2" "$3"
|
||||
fi
|
||||
}
|
||||
|
||||
# --- vtr_drift_ok: ONLY our periodic admin token (right name AND vault-admin) renews ---
|
||||
ok "our token renews" vtr_drift_ok token-devvm-wizard "default,sops-admin,vault-admin"
|
||||
ok "vault-admin anywhere in list" vtr_drift_ok token-devvm-wizard "default,vault-admin"
|
||||
ok "policy order irrelevant" vtr_drift_ok token-devvm-wizard "vault-admin,default"
|
||||
no "woodpecker clobber refused" vtr_drift_ok kubernetes-woodpecker-default "ci,default,terraform-state"
|
||||
no "oidc token (admin but wrong dn)" vtr_drift_ok oidc-vbarzin "default,sops-admin,vault-admin"
|
||||
no "right name, no vault-admin" vtr_drift_ok token-devvm-wizard "default,sops-admin"
|
||||
no "empty display_name" vtr_drift_ok "" "vault-admin"
|
||||
no "empty policies" vtr_drift_ok token-devvm-wizard ""
|
||||
no "no substring false-positive" vtr_drift_ok token-devvm-wizard "default,vault-admin-ro"
|
||||
|
||||
# --- vtr_display_name / vtr_policies_csv: parse real `vault token lookup -format=json` ---
|
||||
LOOKUP_OURS='{"data":{"display_name":"token-devvm-wizard","policies":["default","sops-admin","vault-admin"],"identity_policies":null}}'
|
||||
LOOKUP_OIDC='{"data":{"display_name":"oidc-vbarzin","policies":["default"],"identity_policies":["sops-admin","vault-admin"]}}'
|
||||
LOOKUP_WP='{"data":{"display_name":"kubernetes-woodpecker-default","policies":["ci","default","terraform-state"],"identity_policies":[]}}'
|
||||
eq "dn ours" "token-devvm-wizard" "$(vtr_display_name "$LOOKUP_OURS")"
|
||||
eq "dn oidc" "oidc-vbarzin" "$(vtr_display_name "$LOOKUP_OIDC")"
|
||||
eq "pols ours" "default,sops-admin,vault-admin" "$(vtr_policies_csv "$LOOKUP_OURS")"
|
||||
eq "pols oidc merges token+identity" "default,sops-admin,vault-admin" "$(vtr_policies_csv "$LOOKUP_OIDC")"
|
||||
eq "pols woodpecker" "ci,default,terraform-state" "$(vtr_policies_csv "$LOOKUP_WP")"
|
||||
|
||||
# --- parse + decide end-to-end (the real lookup-JSON -> renew/refuse path) ---
|
||||
ok "ours: parse+decide renews" vtr_drift_ok "$(vtr_display_name "$LOOKUP_OURS")" "$(vtr_policies_csv "$LOOKUP_OURS")"
|
||||
no "woodpecker: parse+decide refused" vtr_drift_ok "$(vtr_display_name "$LOOKUP_WP")" "$(vtr_policies_csv "$LOOKUP_WP")"
|
||||
no "oidc: parse+decide refused" vtr_drift_ok "$(vtr_display_name "$LOOKUP_OIDC")" "$(vtr_policies_csv "$LOOKUP_OIDC")"
|
||||
|
||||
printf '\n%d passed, %d failed\n' "$pass" "$fail"
|
||||
(( fail == 0 ))
|
||||
9
scripts/vault-token-renew.service
Normal file
9
scripts/vault-token-renew.service
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
[Unit]
|
||||
Description=Renew the periodic Vault/OpenBao token in ~/.vault-token
|
||||
Documentation=https://github.com/ViktorBarzin/infra/blob/master/scripts/vault-token-renew.sh
|
||||
Wants=network-online.target
|
||||
After=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=%h/.local/bin/vault-token-renew
|
||||
90
scripts/vault-token-renew.sh
Normal file
90
scripts/vault-token-renew.sh
Normal file
|
|
@ -0,0 +1,90 @@
|
|||
#!/usr/bin/env bash
|
||||
# Renew the long-lived PERIODIC Vault/OpenBao token stored in ~/.vault-token.
|
||||
#
|
||||
# Background: wizard@devvm used to hold a 7-day OIDC login token (re-auth weekly
|
||||
# via `vault login -method=oidc`). On 2026-06-05 that was replaced with a
|
||||
# periodic, orphan token so it never expires. Periodic tokens have no max-TTL;
|
||||
# they only need renewing within each `period` (768h / 32d here). This unit
|
||||
# renews daily, so the token stays alive indefinitely with huge margin. If the
|
||||
# box is ever decommissioned and this stops running, the token self-expires
|
||||
# within ~32 days (unlike a root token, which would live forever).
|
||||
#
|
||||
# Token was minted with (vault-admin = path "*" sudo; sops-admin = transit for SOPS):
|
||||
# vault token create -orphan -period=768h \
|
||||
# -policy=vault-admin -policy=sops-admin -display-name=devvm-wizard
|
||||
# To recreate if ever lost: `vault login -method=oidc`, run the above with
|
||||
# `-field=token > ~/.vault-token`, then `chmod 600 ~/.vault-token`.
|
||||
#
|
||||
# Source of truth: infra/scripts/vault-token-renew.sh (deployed to
|
||||
# ~/.local/bin/vault-token-renew). Driven by the systemd USER units
|
||||
# vault-token-renew.{service,timer}. Deploy + recovery runbook:
|
||||
# infra/docs/runbooks/vault-token-renew-devvm.md
|
||||
|
||||
EXPECTED_DN="token-devvm-wizard"
|
||||
REQUIRED_POLICY="vault-admin"
|
||||
|
||||
# vtr_display_name <lookup-json> -> display_name (empty if absent).
|
||||
vtr_display_name() {
|
||||
printf '%s' "$1" | jq -r '.data.display_name // ""'
|
||||
}
|
||||
|
||||
# vtr_policies_csv <lookup-json> -> comma-joined token policies + identity policies.
|
||||
# Both are merged because a token minted via OIDC carries vault-admin only in
|
||||
# identity_policies, while .data.policies shows just [default] (misleading on its
|
||||
# own — see memory id=4211). Our periodic token carries them as token policies.
|
||||
vtr_policies_csv() {
|
||||
printf '%s' "$1" | jq -r '((.data.policies // []) + (.data.identity_policies // [])) | join(",")'
|
||||
}
|
||||
|
||||
# vtr_drift_ok <display_name> <policies-csv> -> 0 if this is OUR periodic admin
|
||||
# token (right display name AND vault-admin present), 1 otherwise. The comma
|
||||
# fencing makes the policy match exact (so "vault-admin-ro" never matches).
|
||||
vtr_drift_ok() {
|
||||
local dn="$1" pols="$2"
|
||||
[ "$dn" = "$EXPECTED_DN" ] || return 1
|
||||
printf ',%s,' "$pols" | grep -q ",$REQUIRED_POLICY," || return 1
|
||||
}
|
||||
|
||||
vtr_main() {
|
||||
set -euo pipefail
|
||||
export PATH="/usr/local/bin:/usr/bin:/bin:${PATH:-}"
|
||||
export VAULT_ADDR="${VAULT_ADDR:-https://vault.viktorbarzin.me}"
|
||||
|
||||
local log info dn pols out ttl
|
||||
log="${XDG_STATE_HOME:-$HOME/.local/state}/vault-token-renew.log"
|
||||
mkdir -p "$(dirname "$log")"
|
||||
|
||||
if ! info=$(vault token lookup -format=json 2>&1); then
|
||||
printf '%s FAIL: token lookup: %s\n' "$(date -Is)" "$info" >>"$log"
|
||||
exit 1
|
||||
fi
|
||||
dn=$(vtr_display_name "$info")
|
||||
pols=$(vtr_policies_csv "$info")
|
||||
|
||||
# Drift guard (added 2026-06-07): the renewer must NOT keep a FOREIGN token alive.
|
||||
# On 2026-06-05 a stray `vault login -method=kubernetes` overwrote ~/.vault-token
|
||||
# with a read-only woodpecker token, and this script then silently renewed THAT
|
||||
# for two days — masking the loss of write access. So before renewing, confirm
|
||||
# the token is our periodic admin token; if it has drifted, fail loudly (systemd
|
||||
# marks the unit failed) instead of keeping someone else's token alive.
|
||||
if ! vtr_drift_ok "$dn" "$pols"; then
|
||||
printf '%s DRIFT: ~/.vault-token is dn=%q policies=%q (expected dn=%q with %q). Refusing to renew a foreign token. Re-mint: vault login -method=oidc && vault token create -orphan -period=768h -policy=vault-admin -policy=sops-admin -display-name=devvm-wizard -field=token > ~/.vault-token && chmod 600 ~/.vault-token\n' \
|
||||
"$(date -Is)" "$dn" "$pols" "$EXPECTED_DN" "$REQUIRED_POLICY" >>"$log"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# `vault token renew` with no argument renews the calling token (renew-self).
|
||||
# On success, log only the new TTL (never the raw JSON — it contains the token).
|
||||
if out=$(vault token renew -format=json 2>&1); then
|
||||
ttl=$(printf '%s' "$out" | jq -r '.auth.lease_duration' 2>/dev/null || echo '?')
|
||||
printf '%s OK renewed (dn=%s ttl=%ss)\n' "$(date -Is)" "$dn" "$ttl" >>"$log"
|
||||
else
|
||||
printf '%s FAIL: %s\n' "$(date -Is)" "$out" >>"$log"
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
# Run main only when executed directly, so the test can source the pure functions.
|
||||
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
|
||||
vtr_main "$@"
|
||||
fi
|
||||
10
scripts/vault-token-renew.timer
Normal file
10
scripts/vault-token-renew.timer
Normal file
|
|
@ -0,0 +1,10 @@
|
|||
[Unit]
|
||||
Description=Daily renewal of the periodic Vault token in ~/.vault-token
|
||||
|
||||
[Timer]
|
||||
OnCalendar=daily
|
||||
Persistent=true
|
||||
RandomizedDelaySec=300
|
||||
|
||||
[Install]
|
||||
WantedBy=timers.target
|
||||
Loading…
Add table
Add a link
Reference in a new issue