Commit graph

7 commits

Author SHA1 Message Date
Viktor Barzin
fd0f4a0365 fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip]
6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 08:45:33 +00:00
Viktor Barzin
6d224861c4 stem95su: scheduled Drive->site sync CronJob (every 10m)
CronJob stem95su-gdrive-sync (*/10) mounts the content PVC RW and
rclone-syncs the read-only Drive folder "claude" (stem claude/files) onto
it (rclone/rclone:1.74.3, scope=drive.readonly, empty-source guard +
--max-delete 25). ESO ExternalSecret stem95su-rclone <- Vault
secret/stem95su. Requires the GCP OAuth app published to Production or the
refresh token expires ~weekly.

Lands the gdrive-sync stack on master (it had landed on a feature branch
by accident on the shared devvm checkout).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 08:42:26 +00:00
Viktor Barzin
e692a0a0c5 ci: retrigger image rebuild — prior pipeline aborted during PG outage 2026-05-11 19:30:34 +00:00
Viktor Barzin
aee434469c ci: add python3 to infra-ci image — unblocks scripts/tg auth-comment check
Commit 0712a1b6 added a Python-based ingress_factory auth-comment check
that runs from scripts/tg on every plan/apply. The CI image
(forgejo.viktorbarzin.me/viktor/infra-ci) doesn't ship python3, so every
CI apply has been failing since with:

  env: can't execute 'python3': No such file or directory

Adding python3 to the apk install line restores CI applies for all stacks.
The build-ci-image.yml pipeline auto-fires on this commit (path filter
on ci/Dockerfile), so the rebuild + retag happens without manual action.
2026-05-11 19:26:55 +00:00
Viktor Barzin
3eb8b9a4ea ci: add vault CLI to infra-ci image + surface real errors in scripts/tg
The Woodpecker CI pipeline has been silently failing to apply Tier 1
stacks since the state-migration commit e80b2f02 because the Alpine
CI image never had the vault CLI. `scripts/tg` swallowed stderr with
`2>/dev/null` and surfaced a misleading "Cannot read PG credentials
from Vault" message — the real error was `sh: vault: not found`.

Verified with an in-cluster probe: woodpecker/default SA + role=ci
already gets the terraform-state policy and has read capability on
database/static-creds/pg-terraform-state. Auth was never the problem;
the vault binary just wasn't there.

- ci/Dockerfile: pin vault v1.18.1 (matches server) and install
- scripts/tg: pre-flight check + surface real vault output on failure
- Next build-ci-image.yml run rebuilds :latest with vault included;
  subsequent default.yml runs unblock monitoring apply (code-aoxk)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 08:46:50 +00:00
Viktor Barzin
6371e75ef9 [ci] Rebuild infra-ci image — registry index referenced missing blobs
The infra-ci :latest (and :5319f03e) tags in the private registry resolved
to an OCI image index (sha256:7235cba7...) whose referenced amd64 manifest
(98f718c8) and attestation (27d5ab83) blobs returned 404 — either never
uploaded or garbage-collected. Every pipeline since P366 exited 126 on
image pull.

This comment-only Dockerfile change triggers build-ci-image.yml's path
filter, which rebuilds + pushes a fresh image.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:29:20 +00:00
Viktor Barzin
36454b87d1 feat: CI/CD performance overhaul
- New custom CI Docker image (ci/Dockerfile) with TF 1.5.7, TG 0.99.4,
  git-crypt, sops, kubectl pre-installed. Pushed to private registry.
  Eliminates 17 apk add calls + binary downloads per pipeline run.

- Unified CI pipeline: merge default.yml + app-stacks.yml into one.
  Changed-stacks-only detection (git diff, with global-file fallback).
  Concurrency limit (xargs -P 4). Step consolidation (2 steps vs 4).
  Shallow clone (depth=2). Provider cache (TF_PLUGIN_CACHE_DIR).

- Per-stack Vault advisory locks in scripts/tg. 30min TTL with stale
  lock detection. Blocks concurrent applies to same stack.

- TF_PLUGIN_CACHE_DIR enabled by default in scripts/tg for local dev.

- Daily drift detection pipeline (.woodpecker/drift-detection.yml).
  Runs terraform plan on all stacks, Slack alert on drift.

- CI image build pipeline (.woodpecker/build-ci-image.yml).

Expected speedup: ~5-10 min per pipeline run → ~2-4 min.

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:22:26 +00:00