infra/stacks/n8n/workflows
Viktor Barzin 99180bec42 [n8n] Fix broken DIUN auto-upgrade pipeline — missing auth token to claude-agent-service
## Context

DIUN has been detecting image updates and firing Slack + webhook
notifications for weeks, but zero automated upgrades ran because the
handoff from n8n to claude-agent-service was silently 401-ing.

The pipeline (DIUN → n8n webhook → claude-agent-service /execute →
service-upgrade agent) was migrated from DevVM SSH to K8s HTTP in
42f1c3cf. The migration wired `claude-agent-service` (API_BEARER_TOKEN
env set), updated the n8n workflow JSON to POST with `Authorization:
Bearer $env.CLAUDE_AGENT_API_TOKEN`, but missed two things on the n8n
side:

1. The deployment didn't expose `CLAUDE_AGENT_API_TOKEN` to the n8n
   container — workflow sent `Authorization: Bearer ` (empty).
2. The workflow header expression used JS concat (`='Bearer ' + $env.X`)
   which n8n 1.x does NOT evaluate in HTTP Request node header params.
   It needs template-literal form: `=Bearer {{ $env.X }}`.

Evidence: `claude-agent-service` logs showed only `/health` probes —
zero `/execute` calls over 12h despite DIUN firing webhooks. n8n PG
execution 2250 returned `401 Missing bearer token`.

## This change

- Adds ExternalSecret `claude-agent-token` in the `n8n` namespace that
  pulls `api_bearer_token` from Vault `secret/claude-agent-service`
  (same source as the receiving service's token).
- Wires the token into the n8n container as env var
  `CLAUDE_AGENT_API_TOKEN` via `secret_key_ref`.
- Sets `N8N_BLOCK_ENV_ACCESS_IN_NODE=false` so expressions CAN read
  `$env.*` at all (default in 1.x is false already, but setting
  explicitly guards against upstream default flips).
- Fixes the workflow JSON backup (`workflows/diun-upgrade.json`) header
  expression to use `{{ $env.X }}` template syntax.

The live workflow in n8n's PG DB was also patched in place (one-time
`UPDATE workflow_entity SET nodes = REPLACE(...)` — workflows are not
TF-managed; they were imported once).

## What is NOT in this change

- No retroactive re-run of skipped DIUN events. They'll be rediscovered
  in future scans.
- No change to the `claude-agent-service` side — its token and endpoint
  were already correct.
- No Slack alert on n8n HTTP-node failures — future work; right now a
  broken workflow fails silently unless you check Execution History.

## End-to-end verification

```
$ curl -X POST n8n.viktorbarzin.me/webhook/30805ab6-... \
    -d '{"diun_entry_status":"update","diun_entry_image":"docker.io/library/httpd","diun_entry_imagetag":"2.4.66",...}'
{"message":"Workflow was started"}  HTTP 200

# n8n PG: execution_entity latest row  → status=success
# claude-agent-service logs           → "POST /execute HTTP/1.1" 202 Accepted
```

## Reproduce locally

```
1. vault login -method=oidc
2. cd stacks/n8n && ../../scripts/tg apply
3. kubectl -n n8n exec deploy/n8n -- printenv CLAUDE_AGENT_API_TOKEN
   (should print 64-char hex)
4. Fire synthetic webhook with non-critical image (httpd / alpine)
5. Check n8n execution is success, claude-agent-service shows 202
```

Closes: code-ekz
Related: code-bck

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 10:41:09 +00:00
..
.gitkeep chore: add untracked stacks, scripts, and agent configs 2026-04-15 09:33:06 +00:00
diun-upgrade.json [n8n] Fix broken DIUN auto-upgrade pipeline — missing auth token to claude-agent-service 2026-04-18 10:41:09 +00:00