ci(woodpecker): stop applying/planning the Tier-0 vault stack in CI
All checks were successful
ci/woodpecker/push/default Pipeline was successful

The nightly drift-detection cron and every vault-touching push apply have
been failing because CI runs terragrunt plan/apply on the Tier-0 `vault`
stack, which manages Vault's own transit mount + ACL policies. The CI
`ci` Vault role intentionally lacks those admin perms (sys/mounts,
sys/policies/acl), so the run always errors:
  - apply: 403 on vault_mount.transit + vault_policy.personal_emo, plus an
    Invalid for_each (local.k8s_users from secret/platform is deferred)
  - drift: terragrunt plan exits 1 → fails the whole nightly run

vault is Tier-0 = human-applied via OIDC. Skip it in both pipelines:
- default.yml: skip `vault` in the platform-apply loop (kept in
  PLATFORM_STACKS so the app-stack detector still excludes it)
- drift-detection.yml: skip `vault` in the per-stack plan loop
- ci-cd.md: document the exclusion on both pipeline rows

Found during a CI-health sweep (user reported many failures): GitHub
Actions all green; all Woodpecker repos green except this recurring
infra-repo failure, doubled by the legacy repo-1 + repo-82 dual
registration.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-27 15:48:20 +00:00
parent 81c2b14e29
commit bbc797b30e
3 changed files with 15 additions and 2 deletions

View file

@ -174,9 +174,9 @@ Woodpecker is **deploy + cluster-touching steps only**:
| Pipeline | File | Purpose |
|----------|------|---------|
| per-app deploy | `.woodpecker/deploy.yml` (each repo) | `kubectl set image` + Slack notify (event: **manual**) |
| terragrunt apply | `.woodpecker/default.yml` | Changed-stacks apply on push to master (runs in `infra-ci`) |
| terragrunt apply | `.woodpecker/default.yml` | Changed-stacks apply on push to master (runs in `infra-ci`). **Skips Tier-0 `vault`** — it's human-applied via OIDC; the CI `ci` role lacks Vault-admin perms (`sys/mounts`, `sys/policies/acl`) so a CI apply 403s |
| certbot | `.woodpecker/renew-tls.yml` | TLS renewal cron |
| drift-detection | `.woodpecker/drift-detection.yml` | Nightly Terraform drift (runs in `infra-ci`) |
| drift-detection | `.woodpecker/drift-detection.yml` | Nightly Terraform drift (runs in `infra-ci`). **Skips Tier-0 `vault`** (its `plan` 403s under the `ci` role and would fail the whole run) |
| provision-user | `.woodpecker/provision-user.yml` | Add namespace-owner user from Vault spec |
| registry-config-sync | `.woodpecker/registry-config-sync.yml` | SCP `modules/docker-registry/*``10.0.20.10` on change |
| pve-nfs-exports-sync | `.woodpecker/pve-nfs-exports-sync.yml` | Sync `scripts/pve-nfs-exports``/etc/exports` on PVE |