Commit graph

55 commits

Author SHA1 Message Date
Viktor Barzin
a7704f46a6 deploy goldmane-edge-aggregator: durable who-talks-to-whom edge trail (#58, ADR-0014)
Infra side of ADR-0014: an mTLS gRPC consumer of Calico Goldmane's Flows API
that records the namespace-pair edge-set in CNPG and posts a daily new-edge
digest to #security. Adds the goldmane-edge-aggregator stack, the
pg-goldmane-edges Vault rotation role (Tier-0 vault state updated here), and the
namespace in the ghcr-credentials allowlist.

Cert: REUSES the operator-minted, Tigera-CA-signed whisker-backend client cert
(Goldmane verifies only the CA chain, not identity) instead of minting from the
Tigera CA private key. This avoids putting the CA key in TF state AND the
hashicorp/tls provider, which is incompatible with this repo's global
generate-providers/lockfile pattern (it broke every stack's lockfile).

Verified live: aggregator streaming flows, 174 edges in Postgres across 50x54
namespaces, db+slack ExternalSecrets synced, digest dry-run formats correctly,
private image pulls via the Kyverno-synced ghcr-credentials.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 20:59:39 +00:00
Viktor Barzin
b0ccaf1c65 state(vault): update encrypted state 2026-06-21 15:07:01 +00:00
Viktor Barzin
f84e6818b2 state(vault): update encrypted state 2026-06-21 15:07:01 +00:00
Viktor Barzin
524b874036 state(vault): update encrypted state
Some checks failed
ci/woodpecker/push/default Pipeline was canceled
2026-06-20 20:14:53 +00:00
Viktor Barzin
fd0f4a0365 fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip]
6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 08:45:33 +00:00
Viktor Barzin
6d224861c4 stem95su: scheduled Drive->site sync CronJob (every 10m)
CronJob stem95su-gdrive-sync (*/10) mounts the content PVC RW and
rclone-syncs the read-only Drive folder "claude" (stem claude/files) onto
it (rclone/rclone:1.74.3, scope=drive.readonly, empty-source guard +
--max-delete 25). ESO ExternalSecret stem95su-rclone <- Vault
secret/stem95su. Requires the GCP OAuth app published to Production or the
refresh token expires ~weekly.

Lands the gdrive-sync stack on master (it had landed on a feature branch
by accident on the shared devvm checkout).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 08:42:26 +00:00
Viktor Barzin
09514a234b state(vault): update encrypted state 2026-06-08 11:51:06 +00:00
Viktor Barzin
90d7c11c16 state(vault): update encrypted state 2026-06-05 09:19:10 +00:00
Viktor Barzin
d6a61f00ad state(vault): update encrypted state 2026-05-30 07:59:28 +00:00
Viktor Barzin
c7b0ebf6a5 state(vault): update encrypted state 2026-05-22 10:04:55 +00:00
Viktor Barzin
9247a68514 state(vault): update encrypted state 2026-05-21 08:09:11 +00:00
Viktor Barzin
3af3f0507b state(vault): update encrypted state 2026-05-18 19:33:17 +00:00
Viktor Barzin
0eb5c8c292 state(vault): update encrypted state 2026-05-18 19:17:04 +00:00
Viktor Barzin
c5ebbc07e4 state(vault): update encrypted state 2026-05-15 22:46:37 +00:00
Viktor Barzin
407a17d8cd state(vault): update encrypted state 2026-05-11 19:40:05 +00:00
Viktor Barzin
30cdd05bd8 state(vault): update encrypted state 2026-05-10 16:28:09 +00:00
Viktor Barzin
79caba9904 state(vault): update encrypted state 2026-05-07 22:53:04 +00:00
Viktor Barzin
df2fa0a31d state(vault): update encrypted state 2026-04-25 17:09:35 +00:00
Viktor Barzin
7dd580972a state(vault): update encrypted state 2026-04-25 16:57:42 +00:00
Viktor Barzin
08b13858dd state(vault): update encrypted state 2026-04-25 16:16:35 +00:00
Viktor Barzin
3f85cee1ef state(vault): update encrypted state 2026-04-25 16:08:38 +00:00
Viktor Barzin
2eca011cc3 [ci,vault] Fix Tier-1 apply silently failing in Woodpecker
## Context
For weeks, every push to infra has resulted in `build-cli` workflow
failure AND `default` workflow succeed — but the `default` workflow's
"success" was a lie. Inside the apply-loop we were swallowing per-stack
failures with `set +e ... echo FAILED` and the step exited 0 regardless.

Discovered during bd code-3o3 e2e test (qbittorrent 5.0.4 → 5.1.4):
agent commit landed, CI reported `default=success`, but cluster was
unchanged. Log inside the step showed:
    [servarr] Starting apply...
    ERROR: Cannot read PG credentials from Vault.
    Run: vault login -method=oidc
    [servarr] FAILED (exit 1)

Two root causes, two fixes here.

### 1. Vault `ci` role lacks Tier-1 PG backend creds

The Tier-1 PG state backend (2026-04-16 migration, memory 407) uses
the `pg-terraform-state` static DB role. `scripts/tg` reads it via
`vault read database/static-creds/pg-terraform-state`. That path is
permitted by the separate `terraform-state` Vault policy, which is
bound only to a role in namespace `claude-agent`. The CI runner is in
namespace `woodpecker` using role `ci`, whose policy grants only KV
+ K8s-creds + transit. Net: every Tier-1 stack apply from CI has
been dying at the PG-creds fetch since the migration.

**Fix**: attach `vault_policy.terraform_state` to
`vault_kubernetes_auth_backend_role.ci`'s `token_policies`. No new
policy needed — reuses the minimal one from 2026-04-16.

### 2. Apply-loop swallows stack failures

`.woodpecker/default.yml`'s platform + app apply loops use
`set +e; OUTPUT=$(... tg apply ...); EXIT=$?; set -e; [ $EXIT -ne 0 ]
&& echo FAILED` and then continue the while-loop. The step never
re-raises, so it exits 0 regardless of how many stacks failed.

**Fix**: accumulate failed stack names (excluding lock-skipped ones)
into `FAILED_PLATFORM_STACKS` / `FAILED_APP_STACKS`, serialise the
platform list to `.platform_failed` so it survives the step boundary,
and at the end of the app-stack step exit 1 if either list is
non-empty. Lock-skipped stacks remain non-fatal.

Together, (1) unblocks real apply and (2) ensures the Woodpecker
pipeline + the service-upgrade agent can both trust `default`
workflow state again.

## What is NOT in this change
- Re-running the qbittorrent upgrade to converge the cluster — the
  TF file is already at 5.1.4 in git; once CI picks up this commit
  it'll apply on its own, or Viktor can run `tg apply` locally now
  that the ci role has access too.
- Retiring the `set +e ... continue` pattern entirely — keeping the
  per-stack continuation so a single bad stack doesn't hide the
  others' plans from the log. Just making the final status honest.

## Test Plan
### Automated
`terraform plan` / apply clean (Tier-0 via scripts/tg):
```
Plan: 0 to add, 2 to change, 0 to destroy.
  # vault_kubernetes_auth_backend_role.ci will be updated in-place
  ~ token_policies = [
      + "terraform-state",
        # (1 unchanged element hidden)
    ]
  # vault_jwt_auth_backend.oidc will be updated in-place
  ~ tune = [...]    # cosmetic provider-schema drift, pre-existing

Apply complete! Resources: 0 added, 2 changed, 0 destroyed.
```
State re-encrypted via `scripts/state-sync encrypt vault`; enc file
committed.

### Manual Verification
```
# Before (on previous commit — expect failure):
$ kubectl -n woodpecker exec woodpecker-server-0 -- sh -c '
    SA=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token);
    TOK=$(curl -s -X POST http://vault-active.vault.svc:8200/v1/auth/kubernetes/login \
          -d "{\"role\":\"ci\",\"jwt\":\"$SA\"}" | jq -r .auth.client_token);
    curl -s -H "X-Vault-Token: $TOK" \
      http://vault-active.vault.svc:8200/v1/database/static-creds/pg-terraform-state'
→ {"errors":["1 error occurred:\n\t* permission denied\n\n"]}

# After (this commit):
→ {"data":{"username":"terraform_state","password":"..."},...}
```

Pipeline-level: the next infra push will exercise
`.woodpecker/default.yml`; expected first push is this very commit.
Watch `ci.viktorbarzin.me` — the `default` workflow should either
succeed for real (and land actual changes) or exit 1 with
"=== FAILED STACKS ===" so the cause is visible.

Refs: bd code-e1x

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:25:52 +00:00
Viktor Barzin
9e5d7cd825 state(vault): update encrypted state 2026-04-18 22:12:55 +00:00
Viktor Barzin
1860cd1dfb state(vault): update encrypted state 2026-04-17 14:14:05 +00:00
Viktor Barzin
14fa2b9762 state(vault): update encrypted state 2026-04-16 18:43:06 +00:00
Viktor Barzin
a34df78158 state(vault): update encrypted state 2026-04-16 10:24:29 +00:00
Viktor Barzin
aac81e0a1f state(vault): update encrypted state 2026-04-14 11:06:27 +00:00
Viktor Barzin
0eb96e4e22 state(vault): update encrypted state 2026-04-13 23:04:57 +01:00
Viktor Barzin
b7aec4c617 state: update encrypted terraform state 2026-04-12 14:17:12 +01:00
Viktor Barzin
8363efc56b state: update encrypted terraform state 2026-04-12 12:59:01 +01:00
Viktor Barzin
c54a36e7ca state(vault): update encrypted state 2026-04-10 13:33:33 +00:00
Viktor Barzin
cd2d00703c state(vault): update encrypted state 2026-04-06 12:40:54 +03:00
Viktor Barzin
9f91a3db88 state: update encrypted terraform state 2026-04-06 11:26:45 +03:00
Viktor Barzin
f48e400087 state(vault): update encrypted state 2026-04-04 16:10:25 +03:00
Viktor Barzin
e65647edb4 state(vault): add vabbit81 user resources 2026-03-26 17:32:34 +02:00
Viktor Barzin
b6ac68d7f2 state(vault): update encrypted state 2026-03-26 12:21:23 +02:00
Viktor Barzin
45cb49416e state(vault): update encrypted state 2026-03-25 02:48:15 +02:00
Viktor Barzin
41f53a0f3e state(vault): update encrypted state 2026-03-25 02:24:45 +02:00
Viktor Barzin
ab95e0ab2f state(vault): update encrypted state 2026-03-22 15:18:03 +02:00
Viktor Barzin
527bfb1c9e state(vault): update encrypted state 2026-03-22 01:13:02 +02:00
Viktor Barzin
03f55d969f state(vault): update encrypted state 2026-03-18 21:30:59 +00:00
Viktor Barzin
5b29cfc73a state(vault): update encrypted state 2026-03-17 23:46:56 +00:00
Viktor Barzin
4d40c51a97 state(vault): update encrypted state 2026-03-17 23:14:24 +00:00
Viktor Barzin
7a8452e4c7 state(vault): update encrypted state 2026-03-17 23:14:16 +00:00
Viktor Barzin
0215d81622 state(vault): update encrypted state 2026-03-17 23:13:57 +00:00
Viktor Barzin
750cfcce7c state(vault): update encrypted state 2026-03-17 23:13:55 +00:00
Viktor Barzin
e54ad33315 state(vault): update encrypted state 2026-03-17 23:13:19 +00:00
Viktor Barzin
02d0291797 state(vault): update encrypted state 2026-03-17 23:12:58 +00:00
Viktor Barzin
468df3c5c4 state(vault): update encrypted state 2026-03-17 23:12:35 +00:00
Viktor Barzin
cf570c3d3b state(vault): update encrypted state 2026-03-17 23:12:03 +00:00