From a5e097088a0e28b1cf0e1bd9a59d99df444c579d Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Sun, 19 Apr 2026 14:30:39 +0000 Subject: [PATCH] [ci] Persist VAULT_TOKEN across Woodpecker step commands MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Context Follow-up to commit 2eca011c (bd code-e1x). That commit attached the `terraform-state` policy to the `ci` Vault role and propagated apply- loop failures so the pipeline actually fails when a stack fails. On the very first push to exercise it (pipeline 361), the platform apply step died with: [vault] Starting apply... state-sync: ERROR — no Vault token and no age key at ~/.config/sops/age/keys.txt [vault] FAILED (exit 1) Root cause: in Woodpecker's `commands:` list, each `- |` item runs in a fresh shell. The dedicated "Vault auth" command was doing `export VAULT_TOKEN=...`, but that export was lost by the time the apply command ran. Tier-0 stacks depended on Vault Transit (via `scripts/state-sync`), and Tier-1 stacks depend on `vault read database/static-creds/pg-terraform-state` via `scripts/tg` — both silently fell through to their "no Vault" error path. This bug was latent before 2eca011c because the old apply loop swallowed per-stack exit codes. Now that we surface them, the pipeline fails honestly — but fails on every run. Fixing the missing token propagation is the last mile. ## This change - Pin `VAULT_ADDR` at the step's `environment:` level so every command inherits it without an explicit export. - In the Vault auth command, assert the auth succeeded (non-empty, non-"null" token) then write the token to `~/.vault-token` with `umask 077`. `vault`, `scripts/tg`, and `scripts/state-sync` all fall through to `~/.vault-token` when `VAULT_TOKEN` env is unset. ## What is NOT in this change - A broader refactor to fold the multi-step chain into a single `- |` script — preserving the existing granular structure keeps individual step logs grep-friendly and failures localised. - Restoring the VAULT_TOKEN export too — redundant once ~/.vault-token is written, and would need duplicating into each command anyway. ## Test Plan ### Automated N/A (pure YAML change). Will be verified by the very next CI run — the push creating this commit. ### Manual Verification Watch `ci.viktorbarzin.me/repos/1/pipelines` for the pipeline whose commit matches this one. Expected: - `default` workflow exercises the auth + apply steps. - Platform apply for `vault` stack runs state-sync decrypt → detects no drift (I applied locally already) → OK. - Tier-1 stacks (if any in the diff): `vault read database/static- creds/pg-terraform-state` returns creds → apply runs. - No "state-sync: ERROR" or "Cannot read PG credentials" errors. - `default` workflow state: success. - Overall pipeline status: still failure because `build-cli` is independently broken (bd code-12b); that's cosmetic. Refs: bd code-e1x Co-Authored-By: Claude Opus 4.7 (1M context) --- .woodpecker/default.yml | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/.woodpecker/default.yml b/.woodpecker/default.yml index 4cfc27f8..4b4952aa 100644 --- a/.woodpecker/default.yml +++ b/.woodpecker/default.yml @@ -37,6 +37,12 @@ steps: environment: SLACK_WEBHOOK: from_secret: slack_webhook + # Each `- |` command runs in a fresh shell, so we can't rely on an + # `export VAULT_ADDR=...` in the auth command persisting — pin it at + # step level. VAULT_TOKEN is still per-command; we persist it to + # ~/.vault-token (auto-read by `vault` CLI) so downstream commands + # don't need explicit token propagation. + VAULT_ADDR: http://vault-active.vault.svc.cluster.local:8200 commands: # ── Skip CI commits ── - | @@ -55,9 +61,17 @@ steps: # ── Vault auth ── - | SA_TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token) - export VAULT_ADDR=http://vault-active.vault.svc.cluster.local:8200 - export VAULT_TOKEN=$(curl -s -X POST "$VAULT_ADDR/v1/auth/kubernetes/login" \ + VAULT_TOKEN=$(curl -s -X POST "$VAULT_ADDR/v1/auth/kubernetes/login" \ -d "{\"role\":\"ci\",\"jwt\":\"$SA_TOKEN\"}" | jq -r .auth.client_token) + if [ -z "$VAULT_TOKEN" ] || [ "$VAULT_TOKEN" = "null" ]; then + echo "ERROR: Vault K8s auth failed (role=ci, ns=woodpecker)" >&2 + exit 1 + fi + # Persist for downstream `- |` blocks (each runs in a fresh shell, + # so exporting VAULT_TOKEN wouldn't help). `vault`, `scripts/tg`, + # and `scripts/state-sync` all fall through to ~/.vault-token when + # the env var is unset. + umask 077; printf '%s' "$VAULT_TOKEN" > "$HOME/.vault-token" # ── Detect changed stacks ── - |