infra/.woodpecker
Viktor Barzin a5e097088a [ci] Persist VAULT_TOKEN across Woodpecker step commands
## Context
Follow-up to commit 2eca011c (bd code-e1x). That commit attached the
`terraform-state` policy to the `ci` Vault role and propagated apply-
loop failures so the pipeline actually fails when a stack fails. On
the very first push to exercise it (pipeline 361), the platform apply
step died with:

  [vault] Starting apply...
  state-sync: ERROR — no Vault token and no age key at ~/.config/sops/age/keys.txt
  [vault] FAILED (exit 1)

Root cause: in Woodpecker's `commands:` list, each `- |` item runs in
a fresh shell. The dedicated "Vault auth" command was doing
`export VAULT_TOKEN=...`, but that export was lost by the time the
apply command ran. Tier-0 stacks depended on Vault Transit (via
`scripts/state-sync`), and Tier-1 stacks depend on
`vault read database/static-creds/pg-terraform-state` via `scripts/tg`
— both silently fell through to their "no Vault" error path.

This bug was latent before 2eca011c because the old apply loop
swallowed per-stack exit codes. Now that we surface them, the pipeline
fails honestly — but fails on every run. Fixing the missing token
propagation is the last mile.

## This change
- Pin `VAULT_ADDR` at the step's `environment:` level so every command
  inherits it without an explicit export.
- In the Vault auth command, assert the auth succeeded (non-empty,
  non-"null" token) then write the token to `~/.vault-token` with
  `umask 077`. `vault`, `scripts/tg`, and `scripts/state-sync` all
  fall through to `~/.vault-token` when `VAULT_TOKEN` env is unset.

## What is NOT in this change
- A broader refactor to fold the multi-step chain into a single
  `- |` script — preserving the existing granular structure keeps
  individual step logs grep-friendly and failures localised.
- Restoring the VAULT_TOKEN export too — redundant once ~/.vault-token
  is written, and would need duplicating into each command anyway.

## Test Plan
### Automated
N/A (pure YAML change). Will be verified by the very next CI run —
the push creating this commit.

### Manual Verification
Watch `ci.viktorbarzin.me/repos/1/pipelines` for the pipeline whose
commit matches this one. Expected:
- `default` workflow exercises the auth + apply steps.
- Platform apply for `vault` stack runs state-sync decrypt → detects
  no drift (I applied locally already) → OK.
- Tier-1 stacks (if any in the diff): `vault read database/static-
  creds/pg-terraform-state` returns creds → apply runs.
- No "state-sync: ERROR" or "Cannot read PG credentials" errors.
- `default` workflow state: success.
- Overall pipeline status: still failure because `build-cli` is
  independently broken (bd code-12b); that's cosmetic.

Refs: bd code-e1x

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:30:39 +00:00
..
build-ci-image.yml fix: remove manual event from build-ci-image to fix issue automation 2026-04-15 17:31:25 +00:00
build-cli.yml fix: CI pipeline - disable corrupted cache, add pull before push 2026-03-15 22:51:08 +00:00
default.yml [ci] Persist VAULT_TOKEN across Woodpecker step commands 2026-04-19 14:30:39 +00:00
drift-detection.yml [infra] Wire drift detection to Pushgateway + alert on stale/unaddressed drift 2026-04-18 22:42:51 +00:00
issue-automation.yml [claude-agent-service] Migrate all pipelines from DevVM SSH to K8s HTTP 2026-04-18 10:12:02 +00:00
k8s-portal.yml add generic multi-user cluster onboarding system 2026-03-15 22:23:36 +00:00
postmortem-todos.yml [claude-agent-service] Migrate all pipelines from DevVM SSH to K8s HTTP 2026-04-18 10:12:02 +00:00
provision-user.yml fix: remove manual event from build-ci-image to fix issue automation 2026-04-15 17:31:25 +00:00
pve-nfs-exports-sync.yml [infra] Add Woodpecker pipeline to deploy PVE /etc/exports (Wave 6b) 2026-04-18 23:21:36 +00:00
renew-tls.yml fix(renew-tls): update TLS secret in ALL namespaces, not just kyverno 2026-03-23 22:36:31 +02:00