CI now uses scripts/tg instead of raw terragrunt apply, acquiring the
same per-stack Vault KV lock that user sessions use. This prevents CI
from overwriting in-flight user applies.
Changes:
- Switch from xargs -P 4 (parallel) to serial while-read loop
- CI skips stacks locked by users instead of racing them
- Git rebase failures now exit 1 instead of silently continuing
- Updated header comments to reflect new locking behavior
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add WOODPECKER_BACKEND_K8S_PULL_SECRET_NAMES to agent env so step
pods can pull from private registry (registry.viktorbarzin.me:5050)
- Add fallback in default.yml when HEAD~1 is unavailable (shallow
clone with depth=1): fetch more history, or apply all platform
stacks as safe default
- Root cause: pipeline #243 failed because infra-ci:latest image
couldn't be pulled (no imagePullSecrets on step pods)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New custom CI Docker image (ci/Dockerfile) with TF 1.5.7, TG 0.99.4,
git-crypt, sops, kubectl pre-installed. Pushed to private registry.
Eliminates 17 apk add calls + binary downloads per pipeline run.
- Unified CI pipeline: merge default.yml + app-stacks.yml into one.
Changed-stacks-only detection (git diff, with global-file fallback).
Concurrency limit (xargs -P 4). Step consolidation (2 steps vs 4).
Shallow clone (depth=2). Provider cache (TF_PLUGIN_CACHE_DIR).
- Per-stack Vault advisory locks in scripts/tg. 30min TTL with stale
lock detection. Blocks concurrent applies to same stack.
- TF_PLUGIN_CACHE_DIR enabled by default in scripts/tg for local dev.
- Daily drift detection pipeline (.woodpecker/drift-detection.yml).
Runs terraform plan on all stacks, Slack alert on drift.
- CI image build pipeline (.woodpecker/build-ci-image.yml).
Expected speedup: ~5-10 min per pipeline run → ~2-4 min.
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 3: all 27 platform modules now run as independent stacks.
Platform reduced to empty shell (outputs only) for backward compat
with 72 app stacks that declare dependency "platform".
Fixed technitium cross-module dashboard reference by copying file.
Woodpecker pipeline applies all 27+1 stacks in parallel via loop.
All applied with zero destroys.
Phase 2 of platform stack split. 5 more modules extracted into
independent stacks. All applied successfully with zero destroys.
Cloudflared now reads k8s_users from Vault directly to compute
user_domains. Woodpecker pipeline runs all 8 extracted stacks
in parallel. Memory bumped to 6Gi for 9 concurrent TF processes.
Platform reduced from 27 to 19 modules.
Phase 1 of platform stack split for parallel CI applies.
All 3 modules were fully independent (no cross-module refs).
State migrated via terraform state mv. All 3 stacks applied
with zero changes (dbaas had pre-existing ResourceQuota drift).
Woodpecker pipeline updated to run extracted stacks in parallel.
LimitRange defaults containers to 192Mi which is insufficient for
terragrunt apply on the platform stack (48 vault refs, many modules).
Set explicit 1Gi request / 2Gi limit via backend_options.
- build-cli.yml: comment out cache_from/cache_to to avoid BuildKit
"short read" errors from corrupted registry cache
- default.yml: add git pull --rebase before push in cleanup-and-push
to handle remote having newer commits
Vault is now the sole source of truth for secrets. SOPS pipeline
removed entirely — auth via `vault login -method=oidc`.
Part A: SOPS removal
- vault/main.tf: delete 990 lines (93 vars + 43 KV write resources),
add self-read data source for OIDC creds from secret/vault
- terragrunt.hcl: remove SOPS var loading, vault_root_token, check_secrets hook
- scripts/tg: remove SOPS decryption, keep -auto-approve logic
- .woodpecker/default.yml: replace SOPS with Vault K8s auth via curl
- Delete secrets.sops.json, .sops.yaml
Part B: External Secrets Operator
- New stack stacks/external-secrets/ with Helm chart + 2 ClusterSecretStores
(vault-kv for KV v2, vault-database for DB engine)
Part C: Database secrets engine (in vault/main.tf)
- MySQL + PostgreSQL connections with static role rotation (24h)
- 6 MySQL roles (speedtest, wrongmove, codimd, nextcloud, shlink, grafana)
- 6 PostgreSQL roles (trading, health, linkwarden, affine, woodpecker, claude_memory)
Part D: Kubernetes secrets engine (in vault/main.tf)
- RBAC for Vault SA to manage K8s tokens
- Roles: dashboard-admin, ci-deployer, openclaw, local-admin
- New scripts/vault-kubeconfig helper for dynamic kubeconfig
K8s auth method with scoped policies for CI, ESO, OpenClaw, Woodpecker sync.
Phase 5 — CI pipelines:
- default.yml: add SOPS decrypt in prepare step, change git add . to
specific paths (stacks/ state/ .woodpecker/), cleanup on success+failure
- renew-tls.yml: change git add . to git add secrets/ state/
Phase 6 — sensitive=true:
- Add sensitive = true to 256 variable declarations across 149 stack files
- Prevents secret values from appearing in terraform plan output
- Does NOT modify shared modules (ingress_factory, nfs_volume) to avoid
breaking module interface contracts
Note: CI pipeline SOPS decryption requires sops_age_key Woodpecker secret
to be created before the pipeline will work with SOPS. Until then, the old
terraform.tfvars path continues to function.