ci: Slack-notify failed pipeline runs only
All checks were successful
ci/woodpecker/push/default Pipeline was successful

Viktor doesn't want a Slack message for every CI run — only failures.
The infra apply pipeline posted a status line to #general on every push,
and the renew-tls / postmortem-todos / registry-config-sync /
pve-nfs-exports-sync crons posted on every scheduled run (~30+ routine
messages a week). Now: the apply pipeline's success post is gone
(notify-failure already covers failures), all cron notifies are
status:[failure] with explicit FAILED texts, and drift-detection is
silent when all stacks are clean (still posts drift findings and errors,
and gains a hard-failure catch step it previously lacked). Kept:
notify-nonadmin-push (org audit feed) and the actionable provision-user
post. Per-app deploy template in ci-cd.md updated to match.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-07-02 07:27:43 +00:00
parent a64d2ba2b9
commit 88c86e2109
7 changed files with 39 additions and 20 deletions

View file

@ -324,13 +324,8 @@ steps:
fi
GIT_SSH_COMMAND='ssh -i ./secrets/deploy_key -o IdentitiesOnly=yes' git push origin master
# ── Slack notification ──
- |
PLATFORM_COUNT=$(wc -l < .platform_apply 2>/dev/null | tr -d ' ')
APP_COUNT=$(wc -l < .app_apply 2>/dev/null | tr -d ' ')
curl -s -X POST -H 'Content-type: application/json' \
--data "{\"channel\":\"general\",\"text\":\"Woodpecker CI: infra pipeline ${CI_PIPELINE_STATUS} (platform:${PLATFORM_COUNT}, apps:${APP_COUNT})\"}" \
"$SLACK_WEBHOOK" || true
# (No Slack post on success — Viktor 2026-07-02: CI notifies on FAILED
# runs only; the notify-failure step below covers those.)
# Slack on failure (runs even if apply step fails)
- name: notify-failure

View file

@ -147,13 +147,30 @@ steps:
echo "Drift: ${DRIFTED:-none}"
echo "Errors: ${ERRORS:-none}"
# ── Slack alert if drift found ──
# ── Slack only when something is WRONG (drift or errors) ──
# All-clean runs are silent (Viktor 2026-07-02: CI notifies on
# failed/actionable runs only; clean is the daily normal).
if [ -n "$DRIFTED" ]; then
curl -s -X POST -H 'Content-type: application/json' \
--data "{\"channel\":\"general\",\"text\":\":warning: Drift detected in:${DRIFTED}\nClean: ${CLEAN} stacks. Errors:${ERRORS:-none}\"}" \
"$SLACK_WEBHOOK" || true
else
elif [ -n "$ERRORS" ]; then
curl -s -X POST -H 'Content-type: application/json' \
--data "{\"channel\":\"general\",\"text\":\":white_check_mark: Drift detection: all ${CLEAN} stacks clean${ERRORS:+. Errors: $ERRORS}\"}" \
--data "{\"channel\":\"general\",\"text\":\":red_circle: Drift detection had errors: ${ERRORS} (clean: ${CLEAN})\"}" \
"$SLACK_WEBHOOK" || true
fi
# Hard-failure catch: the in-script posts above never run if the step
# itself crashes early — this step is the only signal for that case.
- name: notify-failure
image: curlimages/curl
commands:
- |
curl -s -X POST -H 'Content-type: application/json' \
--data "{\"channel\":\"general\",\"text\":\":red_circle: Drift-detection pipeline FAILED (crashed before reporting)\"}" \
"$SLACK_WEBHOOK" || true
environment:
SLACK_WEBHOOK:
from_secret: slack_webhook
when:
status: [failure]

View file

@ -28,6 +28,7 @@ steps:
from_secret: slack_webhook
commands:
- apk add --no-cache curl
- "curl -sf -X POST https://hooks.slack.com/services/$SLACK_WEBHOOK -H 'Content-Type: application/json' -d '{\"text\": \"Post-mortem TODO pipeline completed\"}' || true"
- "curl -sf -X POST https://hooks.slack.com/services/$SLACK_WEBHOOK -H 'Content-Type: application/json' -d '{\"text\": \":red_circle: Post-mortem TODO pipeline FAILED\"}' || true"
when:
- status: [success, failure]
# Failure-only (Viktor 2026-07-02): CI notifies on failed runs only.
- status: [failure]

View file

@ -58,7 +58,8 @@ steps:
commands:
- |
curl -s -X POST -H 'Content-type: application/json' \
--data "{\"channel\":\"general\",\"text\":\"PVE /etc/exports sync: ${CI_PIPELINE_STATUS}\"}" \
--data "{\"channel\":\"general\",\"text\":\":red_circle: PVE /etc/exports sync FAILED\"}" \
"$SLACK_WEBHOOK" || true
when:
status: [success, failure]
# Failure-only (Viktor 2026-07-02): CI notifies on failed runs only.
status: [failure]

View file

@ -151,7 +151,8 @@ steps:
commands:
- |
curl -s -X POST -H 'Content-type: application/json' \
--data "{\"channel\":\"general\",\"text\":\"Registry config sync on 10.0.20.10: ${CI_PIPELINE_STATUS}\"}" \
--data "{\"channel\":\"general\",\"text\":\":red_circle: Registry config sync on 10.0.20.10 FAILED\"}" \
"$SLACK_WEBHOOK" || true
when:
status: [success, failure]
# Failure-only (Viktor 2026-07-02): CI notifies on failed runs only.
status: [failure]

View file

@ -71,10 +71,11 @@ steps:
commands:
- |
curl -s -X POST -H 'Content-type: application/json' \
--data "{\"channel\":\"general\",\"text\":\"Woodpecker CI: TLS certificate renewal ${CI_PIPELINE_STATUS}\"}" \
--data "{\"channel\":\"general\",\"text\":\":red_circle: Woodpecker CI: TLS certificate renewal FAILED\"}" \
"$SLACK_WEBHOOK" || true
environment:
SLACK_WEBHOOK:
from_secret: slack_webhook
when:
status: [success, failure]
# Failure-only (Viktor 2026-07-02): successful renewals are routine.
status: [failure]

View file

@ -293,7 +293,9 @@ The infra repo runs on Woodpecker via **two** forge registrations: the Forgejo
forge (repo id 82, registered 2026-06-08) and the legacy GitHub forge (repo id
1). Pushes to **Forgejo** `master` fire `.woodpecker/default.yml`
(changed-stacks terragrunt apply, in `infra-ci`) plus the `notify-nonadmin-push`
Slack audit step. Operational facts (2026-06-10):
Slack audit step. **Slack policy (2026-07-02): every infra pipeline posts only
on FAILURE** (plus the non-admin audit post and drift/error findings) — routine
successful runs are silent. Operational facts (2026-06-10):
- **Webhook URL is the IN-CLUSTER service**:
`http://woodpecker-server.woodpecker.svc.cluster.local/api/hook?...` (PATCHed
@ -375,7 +377,8 @@ steps:
notify:
image: plugins/slack
when:
status: [success, failure]
# Failure-only (2026-07-02 policy): CI notifies about failed runs only.
status: [failure]
```
### CI/CD secrets sync