Infra pipelines were failing intermittently across all authors (e.g. #241-244,
#247) with the git clone step exiting 128:
git fetch --depth=1 --filter=tree:0 ... (partial/treeless clone)
git reset --hard <sha>
fatal: could not fetch <tree-sha> from promisor remote
remote: 404 page not found
The plugin-git clone defaulted to a partial (treeless) clone. The initial ref
fetch carries credentials, but the lazy *promisor* object fetch triggered by
`git reset --hard` hits the PRIVATE Forgejo repo without creds -> 404 -> exit
128. Whether it fired was luck-of-the-draw, hence the ~50% intermittent failures
fleet-wide (not specific to any commit).
Fix: set `partial: false` on every clone block so all objects for the (still
shallow) commit are fetched upfront with creds — no fragile lazy promisor fetch.
Diagnosed against the woodpecker Postgres DB (steps/log_entries) since the
Woodpecker HTTP API was itself flapping. Earlier "permission for ViktorBarzin"
log lines were an unrelated cross-forge red herring.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CronJob stem95su-gdrive-sync (*/10) mounts the content PVC RW and
rclone-syncs the read-only Drive folder "claude" (stem claude/files) onto
it (rclone/rclone:1.74.3, scope=drive.readonly, empty-source guard +
--max-delete 25). ESO ExternalSecret stem95su-rclone <- Vault
secret/stem95su. Requires the GCP OAuth app published to Production or the
refresh token expires ~weekly.
Lands the gdrive-sync stack on master (it had landed on a feature branch
by accident on the shared devvm checkout).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Kyverno generate+synchronize only manages secrets it created itself.
Existing Terraform-managed secrets in ~70 namespaces weren't updated.
Now loops through all namespaces and kubectl apply the new cert.
bitnami/kubectl runs as non-root UID 1001, cannot read git-crypt
decrypted secrets owned by root. Switch to alpine (runs as root)
with kubectl downloaded directly.
Kyverno ClusterPolicy clones tls-secret from kyverno namespace to all
namespaces with synchronize=true. Renewal pipeline now updates the source
secret via kubectl, verifies cert validity, and sends Slack notification.
Phase 5 — CI pipelines:
- default.yml: add SOPS decrypt in prepare step, change git add . to
specific paths (stacks/ state/ .woodpecker/), cleanup on success+failure
- renew-tls.yml: change git add . to git add secrets/ state/
Phase 6 — sensitive=true:
- Add sensitive = true to 256 variable declarations across 149 stack files
- Prevents secret values from appearing in terraform plan output
- Does NOT modify shared modules (ingress_factory, nfs_volume) to avoid
breaking module interface contracts
Note: CI pipeline SOPS decryption requires sops_age_key Woodpecker secret
to be created before the pipeline will work with SOPS. Until then, the old
terraform.tfvars path continues to function.