infra

Author	SHA1	Message	Date
Viktor Barzin	a7704f46a6	deploy goldmane-edge-aggregator: durable who-talks-to-whom edge trail (#58 , ADR-0014) Infra side of ADR-0014: an mTLS gRPC consumer of Calico Goldmane's Flows API that records the namespace-pair edge-set in CNPG and posts a daily new-edge digest to #security. Adds the goldmane-edge-aggregator stack, the pg-goldmane-edges Vault rotation role (Tier-0 vault state updated here), and the namespace in the ghcr-credentials allowlist. Cert: REUSES the operator-minted, Tigera-CA-signed whisker-backend client cert (Goldmane verifies only the CA chain, not identity) instead of minting from the Tigera CA private key. This avoids putting the CA key in TF state AND the hashicorp/tls provider, which is incompatible with this repo's global generate-providers/lockfile pattern (it broke every stack's lockfile). Verified live: aggregator streaming flows, 174 edges in Postgres across 50x54 namespaces, db+slack ExternalSecrets synced, digest dry-run formats correctly, private image pulls via the Kyverno-synced ghcr-credentials. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-24 20:59:39 +00:00
Viktor Barzin	b0ccaf1c65	state(vault): update encrypted state	2026-06-21 15:07:01 +00:00
Viktor Barzin	f84e6818b2	state(vault): update encrypted state	2026-06-21 15:07:01 +00:00
Viktor Barzin	524b874036	state(vault): update encrypted state Some checks failed ci/woodpecker/push/default Pipeline was canceled Details	2026-06-20 20:14:53 +00:00
Viktor Barzin	fd0f4a0365	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip] `6d224861` came from a --no-checkout worktree whose empty index made the commit drop every file except two. This restores 05b50d2b's full tree and correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the live infra was never applied from the broken commit. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 08:45:33 +00:00
Viktor Barzin	6d224861c4	stem95su: scheduled Drive->site sync CronJob (every 10m) CronJob stem95su-gdrive-sync (*/10) mounts the content PVC RW and rclone-syncs the read-only Drive folder "claude" (stem claude/files) onto it (rclone/rclone:1.74.3, scope=drive.readonly, empty-source guard + --max-delete 25). ESO ExternalSecret stem95su-rclone <- Vault secret/stem95su. Requires the GCP OAuth app published to Production or the refresh token expires ~weekly. Lands the gdrive-sync stack on master (it had landed on a feature branch by accident on the shared devvm checkout). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 08:42:26 +00:00
Viktor Barzin	09514a234b	state(vault): update encrypted state	2026-06-08 11:51:06 +00:00
Viktor Barzin	90d7c11c16	state(vault): update encrypted state	2026-06-05 09:19:10 +00:00
Viktor Barzin	d6a61f00ad	state(vault): update encrypted state	2026-05-30 07:59:28 +00:00
Viktor Barzin	c7b0ebf6a5	state(vault): update encrypted state	2026-05-22 10:04:55 +00:00
Viktor Barzin	9247a68514	state(vault): update encrypted state	2026-05-21 08:09:11 +00:00
Viktor Barzin	3af3f0507b	state(vault): update encrypted state	2026-05-18 19:33:17 +00:00
Viktor Barzin	0eb5c8c292	state(vault): update encrypted state	2026-05-18 19:17:04 +00:00
Viktor Barzin	c5ebbc07e4	state(vault): update encrypted state	2026-05-15 22:46:37 +00:00
Viktor Barzin	407a17d8cd	state(vault): update encrypted state	2026-05-11 19:40:05 +00:00
Viktor Barzin	30cdd05bd8	state(vault): update encrypted state	2026-05-10 16:28:09 +00:00
Viktor Barzin	79caba9904	state(vault): update encrypted state	2026-05-07 22:53:04 +00:00
Viktor Barzin	df2fa0a31d	state(vault): update encrypted state	2026-04-25 17:09:35 +00:00
Viktor Barzin	7dd580972a	state(vault): update encrypted state	2026-04-25 16:57:42 +00:00
Viktor Barzin	08b13858dd	state(vault): update encrypted state	2026-04-25 16:16:35 +00:00
Viktor Barzin	3f85cee1ef	state(vault): update encrypted state	2026-04-25 16:08:38 +00:00
Viktor Barzin	2eca011cc3	[ci,vault] Fix Tier-1 apply silently failing in Woodpecker ## Context For weeks, every push to infra has resulted in `build-cli` workflow failure AND `default` workflow succeed — but the `default` workflow's "success" was a lie. Inside the apply-loop we were swallowing per-stack failures with `set +e ... echo FAILED` and the step exited 0 regardless. Discovered during bd code-3o3 e2e test (qbittorrent 5.0.4 → 5.1.4): agent commit landed, CI reported `default=success`, but cluster was unchanged. Log inside the step showed: [servarr] Starting apply... ERROR: Cannot read PG credentials from Vault. Run: vault login -method=oidc [servarr] FAILED (exit 1) Two root causes, two fixes here. ### 1. Vault `ci` role lacks Tier-1 PG backend creds The Tier-1 PG state backend (2026-04-16 migration, memory 407) uses the `pg-terraform-state` static DB role. `scripts/tg` reads it via `vault read database/static-creds/pg-terraform-state`. That path is permitted by the separate `terraform-state` Vault policy, which is bound only to a role in namespace `claude-agent`. The CI runner is in namespace `woodpecker` using role `ci`, whose policy grants only KV + K8s-creds + transit. Net: every Tier-1 stack apply from CI has been dying at the PG-creds fetch since the migration. Fix: attach `vault_policy.terraform_state` to `vault_kubernetes_auth_backend_role.ci`'s `token_policies`. No new policy needed — reuses the minimal one from 2026-04-16. ### 2. Apply-loop swallows stack failures `.woodpecker/default.yml`'s platform + app apply loops use `set +e; OUTPUT=$(... tg apply ...); EXIT=$?; set -e; [ $EXIT -ne 0 ] && echo FAILED` and then continue the while-loop. The step never re-raises, so it exits 0 regardless of how many stacks failed. Fix: accumulate failed stack names (excluding lock-skipped ones) into `FAILED_PLATFORM_STACKS` / `FAILED_APP_STACKS`, serialise the platform list to `.platform_failed` so it survives the step boundary, and at the end of the app-stack step exit 1 if either list is non-empty. Lock-skipped stacks remain non-fatal. Together, (1) unblocks real apply and (2) ensures the Woodpecker pipeline + the service-upgrade agent can both trust `default` workflow state again. ## What is NOT in this change - Re-running the qbittorrent upgrade to converge the cluster — the TF file is already at 5.1.4 in git; once CI picks up this commit it'll apply on its own, or Viktor can run `tg apply` locally now that the ci role has access too. - Retiring the `set +e ... continue` pattern entirely — keeping the per-stack continuation so a single bad stack doesn't hide the others' plans from the log. Just making the final status honest. ## Test Plan ### Automated `terraform plan` / apply clean (Tier-0 via scripts/tg): ``` Plan: 0 to add, 2 to change, 0 to destroy. # vault_kubernetes_auth_backend_role.ci will be updated in-place ~ token_policies = [ + "terraform-state", # (1 unchanged element hidden) ] # vault_jwt_auth_backend.oidc will be updated in-place ~ tune = [...] # cosmetic provider-schema drift, pre-existing Apply complete! Resources: 0 added, 2 changed, 0 destroyed. ``` State re-encrypted via `scripts/state-sync encrypt vault`; enc file committed. ### Manual Verification ``` # Before (on previous commit — expect failure): $ kubectl -n woodpecker exec woodpecker-server-0 -- sh -c ' SA=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token); TOK=$(curl -s -X POST http://vault-active.vault.svc:8200/v1/auth/kubernetes/login \ -d "{\"role\":\"ci\",\"jwt\":\"$SA\"}" \| jq -r .auth.client_token); curl -s -H "X-Vault-Token: $TOK" \ http://vault-active.vault.svc:8200/v1/database/static-creds/pg-terraform-state' → {"errors":["1 error occurred:\n\t* permission denied\n\n"]} # After (this commit): → {"data":{"username":"terraform_state","password":"..."},...} ``` Pipeline-level: the next infra push will exercise `.woodpecker/default.yml`; expected first push is this very commit. Watch `ci.viktorbarzin.me` — the `default` workflow should either succeed for real (and land actual changes) or exit 1 with "=== FAILED STACKS ===" so the cause is visible. Refs: bd code-e1x Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 14:25:52 +00:00
Viktor Barzin	9e5d7cd825	state(vault): update encrypted state	2026-04-18 22:12:55 +00:00
Viktor Barzin	1860cd1dfb	state(vault): update encrypted state	2026-04-17 14:14:05 +00:00
Viktor Barzin	14fa2b9762	state(vault): update encrypted state	2026-04-16 18:43:06 +00:00
Viktor Barzin	a34df78158	state(vault): update encrypted state	2026-04-16 10:24:29 +00:00
Viktor Barzin	aac81e0a1f	state(vault): update encrypted state	2026-04-14 11:06:27 +00:00
Viktor Barzin	0eb96e4e22	state(vault): update encrypted state	2026-04-13 23:04:57 +01:00
Viktor Barzin	b7aec4c617	state: update encrypted terraform state	2026-04-12 14:17:12 +01:00
Viktor Barzin	8363efc56b	state: update encrypted terraform state	2026-04-12 12:59:01 +01:00
Viktor Barzin	c54a36e7ca	state(vault): update encrypted state	2026-04-10 13:33:33 +00:00
Viktor Barzin	cd2d00703c	state(vault): update encrypted state	2026-04-06 12:40:54 +03:00
Viktor Barzin	9f91a3db88	state: update encrypted terraform state	2026-04-06 11:26:45 +03:00
Viktor Barzin	f48e400087	state(vault): update encrypted state	2026-04-04 16:10:25 +03:00
Viktor Barzin	e65647edb4	state(vault): add vabbit81 user resources	2026-03-26 17:32:34 +02:00
Viktor Barzin	b6ac68d7f2	state(vault): update encrypted state	2026-03-26 12:21:23 +02:00
Viktor Barzin	45cb49416e	state(vault): update encrypted state	2026-03-25 02:48:15 +02:00
Viktor Barzin	41f53a0f3e	state(vault): update encrypted state	2026-03-25 02:24:45 +02:00
Viktor Barzin	ab95e0ab2f	state(vault): update encrypted state	2026-03-22 15:18:03 +02:00
Viktor Barzin	527bfb1c9e	state(vault): update encrypted state	2026-03-22 01:13:02 +02:00
Viktor Barzin	03f55d969f	state(vault): update encrypted state	2026-03-18 21:30:59 +00:00
Viktor Barzin	5b29cfc73a	state(vault): update encrypted state	2026-03-17 23:46:56 +00:00
Viktor Barzin	4d40c51a97	state(vault): update encrypted state	2026-03-17 23:14:24 +00:00
Viktor Barzin	7a8452e4c7	state(vault): update encrypted state	2026-03-17 23:14:16 +00:00
Viktor Barzin	0215d81622	state(vault): update encrypted state	2026-03-17 23:13:57 +00:00
Viktor Barzin	750cfcce7c	state(vault): update encrypted state	2026-03-17 23:13:55 +00:00
Viktor Barzin	e54ad33315	state(vault): update encrypted state	2026-03-17 23:13:19 +00:00
Viktor Barzin	02d0291797	state(vault): update encrypted state	2026-03-17 23:12:58 +00:00
Viktor Barzin	468df3c5c4	state(vault): update encrypted state	2026-03-17 23:12:35 +00:00
Viktor Barzin	cf570c3d3b	state(vault): update encrypted state	2026-03-17 23:12:03 +00:00

1 2

55 commits