## Context
Since the 2026-04-15 migration from SSH-on-DevVM to in-cluster
claude-agent-service, the agent spec's four `vault kv get ...` calls
have been dead code: the pod has no `VAULT_TOKEN`, no `~/.vault-token`,
no Vault login method, and port 8200 is refused. Every token fetch
returns empty, which silently breaks:
- **Slack**: `SLACK_WEBHOOK=""` → POSTs 404 → no messages for 3+ days
(the exact user-visible symptom that started this thread).
- **Woodpecker CI polling**: `WOODPECKER_TOKEN=""` → 401 on
`/api/repos/1/pipelines` → agent can't find its own pipeline → 15-min
poll times out → jumps to rollback → same failure in the revert → hits
n8n's 30-min ceiling → SIGKILL mid-saga → no commit, no Slack.
- **Changelog fetch**: `GITHUB_TOKEN=""` overrides the env var supplied
by `envFrom: claude-agent-secrets`, crippling changelog lookups too.
Separately, Step 9 read the overall pipeline `status`, which is
`failure` any time a single workflow fails — e.g. the unrelated
`build-cli` workflow (docker image push to registry.viktorbarzin.me:5050
has been erroring since private-registry htpasswd was enabled on
2026-03-22). That made the agent spuriously rollback every otherwise-
successful upgrade.
## This change
- Replace the four `vault kv get ...` invocations with the matching
env-var reads (`$GITHUB_TOKEN`, `$WOODPECKER_API_TOKEN`,
`$SLACK_WEBHOOK_URL`) and document the env-var contract at the top
of the "Environment" section. The env vars are expected to be
pre-loaded via `envFrom: claude-agent-secrets` — that part is tracked
as the companion ExternalSecret/Terraform change in bd code-3o3
(must land before this spec is effective).
- Rewrite Step 9 to poll the `default` workflow's `state` instead of
the overall pipeline `status`. Adds a jq example and explicitly
documents the build-cli noise so future operators know why overall
status is unreliable.
## What is NOT in this change
- The matching ExternalSecret / Terraform changes that feed
WOODPECKER_API_TOKEN / SLACK_WEBHOOK_URL / REGISTRY_USER /
REGISTRY_PASSWORD into the pod. Until those land, this spec still
produces empty env vars at runtime — but at least the *shape* of the
contract is correct and grep-friendly.
- The .woodpecker/build-cli.yml `logins:` entry for
registry.viktorbarzin.me:5050. That's fix C in the same task.
## Test Plan
### Automated
None — this is pure markdown guidance for the model. Syntax-checked by
`grep -nE 'vault kv get|WOODPECKER_TOKEN|SLACK_WEBHOOK[^_]'
.claude/agents/service-upgrade.md` showing only the explanatory
warning on line 37 as a match.
### Manual Verification
After the companion ExternalSecret change lands and the pod has
WOODPECKER_API_TOKEN + SLACK_WEBHOOK_URL in env:
1. Trigger a DIUN-style webhook on a known slow service.
2. Watch `kubectl -n claude-agent logs -f deploy/claude-agent-service`.
3. Expect curl to `ci.viktorbarzin.me/api/...` return 200 and pipeline
JSON (no 401), and Slack `$SLACK_WEBHOOK_URL` return 200.
4. Expect a Slack `[Upgrade Agent] Starting:` post inside the first
minute, and a `SUCCESS` or `FAILED + ROLLED BACK` post on exit.
Refs: bd code-3o3
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>