ci: add vault CLI to infra-ci image + surface real errors in scripts/tg

The Woodpecker CI pipeline has been silently failing to apply Tier 1
stacks since the state-migration commit e80b2f02 because the Alpine
CI image never had the vault CLI. `scripts/tg` swallowed stderr with
`2>/dev/null` and surfaced a misleading "Cannot read PG credentials
from Vault" message — the real error was `sh: vault: not found`.

Verified with an in-cluster probe: woodpecker/default SA + role=ci
already gets the terraform-state policy and has read capability on
database/static-creds/pg-terraform-state. Auth was never the problem;
the vault binary just wasn't there.

- ci/Dockerfile: pin vault v1.18.1 (matches server) and install
- scripts/tg: pre-flight check + surface real vault output on failure
- Next build-ci-image.yml run rebuilds :latest with vault included;
  subsequent default.yml runs unblock monitoring apply (code-aoxk)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-04-22 08:46:50 +00:00
parent 4a343c33f0
commit 3eb8b9a4ea
2 changed files with 26 additions and 6 deletions

View file

@ -1,12 +1,11 @@
FROM alpine:3.20
# Rebuild 2026-04-19 — previous :latest index referenced missing blobs (404 on 98f718c8 / 27d5ab83)
# Pin versions to match CI requirements
ARG TERRAFORM_VERSION=1.5.7
ARG TERRAGRUNT_VERSION=0.99.4
ARG SOPS_VERSION=3.9.4
ARG KUBECTL_VERSION=1.34.0
ARG VAULT_VERSION=1.18.1
# Install system packages (single layer)
RUN apk add --no-cache \
@ -36,6 +35,16 @@ RUN curl -fsSL "https://dl.k8s.io/release/v${KUBECTL_VERSION}/bin/linux/amd64/ku
-o /usr/local/bin/kubectl \
&& chmod +x /usr/local/bin/kubectl
# Vault CLI — required by scripts/tg for Tier 1 stack PG credential reads
# and Tier 0 advisory locks. Pinned to server version (1.18.1). Without this
# the CI pipeline surfaces the misleading "Cannot read PG credentials" error
# because scripts/tg swallows stderr ("vault: not found").
RUN curl -fsSL "https://releases.hashicorp.com/vault/${VAULT_VERSION}/vault_${VAULT_VERSION}_linux_amd64.zip" \
-o /tmp/vault.zip \
&& unzip /tmp/vault.zip -d /usr/local/bin/ \
&& rm /tmp/vault.zip \
&& vault version
# Provider cache directory (shared across stacks)
ENV TF_PLUGIN_CACHE_DIR=/tmp/terraform-plugin-cache
ENV TF_PLUGIN_CACHE_MAY_BREAK_DEPENDENCY_LOCK_FILE=1

View file

@ -72,12 +72,23 @@ if [ -n "$STACK_NAME" ]; then
else
# Tier 1: PG backend — fetch credentials from Vault
if [ -z "${PG_CONN_STR:-}" ]; then
PG_CREDS=$(vault read -format=json database/static-creds/pg-terraform-state 2>/dev/null) || {
echo "ERROR: Cannot read PG credentials from Vault. Run: vault login -method=oidc" >&2
# Pre-flight: vault CLI must be available. Previously CI failed with a
# misleading "Cannot read PG credentials" message because the Alpine CI
# image lacked the vault binary — the 2>/dev/null below swallowed the
# real "vault: not found" error. Fail fast with a clear message instead.
if ! command -v vault >/dev/null 2>&1; then
echo "ERROR: vault CLI not found on PATH. Install it or use an image that includes it (ci/Dockerfile)." >&2
exit 1
fi
VAULT_OUT=$(vault read -format=json database/static-creds/pg-terraform-state 2>&1) || {
echo "ERROR: Cannot read PG credentials from Vault. Vault output follows:" >&2
echo "$VAULT_OUT" >&2
echo "" >&2
echo "Hint: humans run 'vault login -method=oidc'; CI auths via K8s SA (role=ci)." >&2
exit 1
}
PG_USER=$(echo "$PG_CREDS" | jq -r .data.username)
PG_PASS=$(echo "$PG_CREDS" | jq -r .data.password)
PG_USER=$(echo "$VAULT_OUT" | jq -r .data.username)
PG_PASS=$(echo "$VAULT_OUT" | jq -r .data.password)
export PG_CONN_STR="postgres://${PG_USER}:${PG_PASS}@10.0.20.200:5432/terraform_state?sslmode=disable"
fi
fi