monitoring: consolidate all Slack alerting to #alerts, abandon #security
Some checks are pending
ci/woodpecker/push/default Pipeline is running
Some checks are pending
ci/woodpecker/push/default Pipeline is running
The dedicated #security Slack channel was unreachable: the shared incoming
webhook (Vault secret/viktor -> alertmanager_slack_api_url) belongs to a
Slack app that isn't a member of #security, so any channel override on it
returns HTTP 404 channel_not_found. The goldmane-edges-digest was silently
failing for that reason.
Per request ("dump the security channel, post in an existing one"), route
everything to #alerts instead:
- alertmanager slack-security receiver -> #alerts (keeps its [SECURITY/<sev>]
title styling so security-lane alerts still stand out in the shared channel)
- goldmane-edges-digest CronJob SLACK_CHANNEL -> #alerts (comment only; value
was already switched and applied last change)
- AggregatorDown / DigestFailing alert summaries reworded to say #alerts
- docs swept (security.md, monitoring.md, ADR-0014, goldmane runbook,
.claude/CLAUDE.md, service-catalog, CONTEXT.md) to drop the
"invite the app / flip back to #security" caveats and state the
#security abandonment + #alerts consolidation as the current routing.
Monitoring stack applied (alertmanager rolled, live config verified:
slack-security channel is now #alerts).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
196d0db4bd
commit
fd33d1a447
9 changed files with 32 additions and 28 deletions
|
|
@ -67,7 +67,7 @@ small no matter how much traffic flows.
|
|||
|
||||
### Slack `#alerts` — daily digest
|
||||
|
||||
> **Channel note (2026-06-25):** posts to **`#alerts`**, not `#security`. The shared `alertmanager_slack_api_url` incoming webhook's Slack app is not a member of `#security`, so a channel override there returns HTTP `404 channel_not_found` (this almost certainly also breaks alertmanager's `slack-security` receiver — verify separately). To route the digest (and security alerts) to `#security`: invite that webhook's Slack app to `#security`, then set `SLACK_CHANNEL=#security` in `stacks/goldmane-edge-aggregator` and re-apply.
|
||||
> **Channel note (2026-06-25):** posts to **`#alerts`**. The dedicated `#security` channel was abandoned — the shared `alertmanager_slack_api_url` incoming webhook's Slack app is not a member of it, so a channel override there returns HTTP `404 channel_not_found`. Everything now posts to `#alerts` (this digest plus alertmanager's `slack-security` receiver, which keeps its `[SECURITY]` styling so security-lane alerts still stand out there).
|
||||
|
||||
- CronJob `goldmane-edges-digest` (08:00 Europe/London) posts edges first seen
|
||||
in the last 24h. Quiet when there are none. Reuses the existing alert-digest
|
||||
|
|
@ -282,7 +282,7 @@ ExternalSecret resolved. A dry run / smoke test: run the image with `args:
|
|||
> Known state (2026-06-25): the digest CronJob's first Job **failed** and it has
|
||||
> never successfully posted (`lastSuccessfulTime` empty) — the digest leg is the
|
||||
> live gap; `DigestFailing` is catching it. Edges still land in the DB via the
|
||||
> `aggregate` Deployment; only the `#security` notification is affected.
|
||||
> `aggregate` Deployment; only the `#alerts` digest notification is affected.
|
||||
> Investigation/fix belongs to the aggregator slice (#58/#60), not monitoring.
|
||||
|
||||
**No edges at all in the table.** Confirm Goldmane is enabled
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue