Commit graph

5 commits

Author SHA1 Message Date
Viktor Barzin
e7ce545da2 [job-hunter] Add infra stack + Grafana dashboard + n8n digest workflow
New service stack at stacks/job-hunter/ mirroring the payslip-ingest
pattern: per-service CNPG database + role (via dbaas null_resource),
Vault static role pg-job-hunter (7d rotation), ExternalSecrets for app
secrets and DB creds, Deployment with alembic-migrate init container,
ClusterIP Service, Grafana datasource ConfigMap.

Grafana dashboard job-hunter.json in Finance folder: new roles per
day, source breakdown, top companies, GBP salary distribution, recent
roles table (sorted by parse confidence then salary).

n8n weekly-digest workflow calls POST /digest/generate with bearer
auth every Monday 07:00 London; digest_runs table provides
idempotency.

Refs: code-snp

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:09:29 +00:00
Viktor Barzin
99180bec42 [n8n] Fix broken DIUN auto-upgrade pipeline — missing auth token to claude-agent-service
## Context

DIUN has been detecting image updates and firing Slack + webhook
notifications for weeks, but zero automated upgrades ran because the
handoff from n8n to claude-agent-service was silently 401-ing.

The pipeline (DIUN → n8n webhook → claude-agent-service /execute →
service-upgrade agent) was migrated from DevVM SSH to K8s HTTP in
42f1c3cf. The migration wired `claude-agent-service` (API_BEARER_TOKEN
env set), updated the n8n workflow JSON to POST with `Authorization:
Bearer $env.CLAUDE_AGENT_API_TOKEN`, but missed two things on the n8n
side:

1. The deployment didn't expose `CLAUDE_AGENT_API_TOKEN` to the n8n
   container — workflow sent `Authorization: Bearer ` (empty).
2. The workflow header expression used JS concat (`='Bearer ' + $env.X`)
   which n8n 1.x does NOT evaluate in HTTP Request node header params.
   It needs template-literal form: `=Bearer {{ $env.X }}`.

Evidence: `claude-agent-service` logs showed only `/health` probes —
zero `/execute` calls over 12h despite DIUN firing webhooks. n8n PG
execution 2250 returned `401 Missing bearer token`.

## This change

- Adds ExternalSecret `claude-agent-token` in the `n8n` namespace that
  pulls `api_bearer_token` from Vault `secret/claude-agent-service`
  (same source as the receiving service's token).
- Wires the token into the n8n container as env var
  `CLAUDE_AGENT_API_TOKEN` via `secret_key_ref`.
- Sets `N8N_BLOCK_ENV_ACCESS_IN_NODE=false` so expressions CAN read
  `$env.*` at all (default in 1.x is false already, but setting
  explicitly guards against upstream default flips).
- Fixes the workflow JSON backup (`workflows/diun-upgrade.json`) header
  expression to use `{{ $env.X }}` template syntax.

The live workflow in n8n's PG DB was also patched in place (one-time
`UPDATE workflow_entity SET nodes = REPLACE(...)` — workflows are not
TF-managed; they were imported once).

## What is NOT in this change

- No retroactive re-run of skipped DIUN events. They'll be rediscovered
  in future scans.
- No change to the `claude-agent-service` side — its token and endpoint
  were already correct.
- No Slack alert on n8n HTTP-node failures — future work; right now a
  broken workflow fails silently unless you check Execution History.

## End-to-end verification

```
$ curl -X POST n8n.viktorbarzin.me/webhook/30805ab6-... \
    -d '{"diun_entry_status":"update","diun_entry_image":"docker.io/library/httpd","diun_entry_imagetag":"2.4.66",...}'
{"message":"Workflow was started"}  HTTP 200

# n8n PG: execution_entity latest row  → status=success
# claude-agent-service logs           → "POST /execute HTTP/1.1" 202 Accepted
```

## Reproduce locally

```
1. vault login -method=oidc
2. cd stacks/n8n && ../../scripts/tg apply
3. kubectl -n n8n exec deploy/n8n -- printenv CLAUDE_AGENT_API_TOKEN
   (should print 64-char hex)
4. Fire synthetic webhook with non-critical image (httpd / alpine)
5. Check n8n execution is success, claude-agent-service shows 202
```

Closes: code-ekz
Related: code-bck

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 10:41:09 +00:00
Viktor Barzin
42f1c3cf4f [claude-agent-service] Migrate all pipelines from DevVM SSH to K8s HTTP
## Context

The claude-agent-service K8s pod (deployed 2026-04-15) provides an HTTP API
for running Claude headless agents. Three workflows still SSH'd to the DevVM
(10.0.10.10) to invoke `claude -p`. This eliminates that dependency.

## This change

Pipeline migrations (SSH → HTTP POST to claude-agent-service):
- `.woodpecker/issue-automation.yml` — Vault auth fetches API token instead
  of SSH key; curl POST /execute + poll /jobs/{id} replaces SSH invocation
- `scripts/postmortem-pipeline.sh` — same pattern; uses jq for safe JSON
  construction of TODO payloads
- `.woodpecker/postmortem-todos.yml` — drop openssh-client from apk install
- `stacks/n8n/workflows/diun-upgrade.json` — SSH node replaced with HTTP
  Request node; API token via $env.CLAUDE_AGENT_API_TOKEN (added to Vault
  secret/n8n)

Documentation updates:
- `docs/architecture/incident-response.md` — Mermaid diagram: DevVM → K8s
- `docs/architecture/automated-upgrades.md` — pipeline diagram + n8n action
- `AGENTS.md` — pipeline description updated

## What is NOT in this change

- DevVM decommissioning (still hosts terminal/foolery services)
- Removal of SSH key secrets from Vault (kept for rollback)
- n8n workflow import (must be done manually in n8n UI)

[ci skip]

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
2026-04-18 10:12:02 +00:00
Viktor Barzin
c33f597111 feat(upgrade-agent): add automated service upgrade pipeline with n8n + DIUN
Pipeline: DIUN detects new image versions every 6h → webhook to n8n →
n8n filters (skip databases/custom/infra/:latest) and rate-limits
(max 5/6h) → SSH to dev VM → claude -p runs upgrade agent.

Agent workflow: resolve GitHub repo → fetch changelogs → classify risk
(SAFE/CAUTION) → backup DB if needed → bump version in .tf → commit+push
→ wait for CI → verify (pod ready + HTTP + Uptime Kuma) → rollback on
failure.

Changes:
- stacks/n8n: add N8N_PORT=5678 to fix K8s env var conflict
- stacks/n8n/workflows: version-controlled n8n workflow backup
- docs/architecture/automated-upgrades.md: full pipeline documentation
- AGENTS.md: add upgrade agent section
- service-catalog.md: update DIUN description

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:38:27 +00:00
Viktor Barzin
bcad200a23 chore: add untracked stacks, scripts, and agent configs
- New stacks: beads-server, hermes-agent
- Terragrunt tiers.tf for infra, phpipam, status-page
- Secrets symlinks for vault, phpipam, hermes-agent
- Scripts: cluster_manager, image_pull, containerd pullthrough setup
- Frigate config, audiblez-web app source, n8n workflows dir
- Claude agent: service-upgrade, reference: upgrade-config.json
- Removed: claudeception skill, excalidraw empty submodule, temp listings

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:33:06 +00:00