infra/.claude/skills/post-mortem/skill.md

# Post-Mortem Writer

Generate a structured post-mortem document after an incident mitigation session.

## When to use
- After `/post-mortem` command
- Auto-suggested when cluster health transitions from UNHEALTHY → HEALTHY

## Instructions

1. **Gather context**:
   - Run `.claude/scripts/sev-context.sh` to capture current cluster state
   - Review the conversation history for: what broke, timeline, root cause, what was fixed
   - Check existing post-mortems at `docs/post-mortems/` for format reference

2. **Generate the post-mortem**:
   - Use the template at `.claude/skills/post-mortem/template.md`
   - Fill in all sections from the investigation context
   - **Critical**: In the Prevention Plan tables, set the `Type` column correctly:
     - `Alert` — add/modify Prometheus alerting rules (auto-implementable)
     - `Config` — change Terraform config, NFS options, etc. (auto-implementable)
     - `Monitor` — add Uptime Kuma monitors (auto-implementable)
     - `Architecture` — storage migration, stack redesign (human-only)
     - `Investigation` — needs further research (human-only)
     - `Runbook` — document a procedure (human-only)
     - `Migration` — data or service migration (human-only)
   - Items already fixed during the session should have Status = `Done`
   - Items not yet done should have Status = `TODO`

3. **File naming**: `docs/post-mortems/<YYYY-MM-DD>-<slug>.md`
   - Slug: lowercase, hyphenated, max 5 words describing the incident

4. **Update index**: Add an entry to `docs/post-mortems/index.html`
   - Add a new card in the incidents grid with date, severity tag, title, description

5. **Commit and push**:
   ```
   git add docs/post-mortems/<file>.md docs/post-mortems/index.html
   git commit -m "docs: post-mortem for <date> <title> [ci skip]"
   git push origin master
   ```
   - Use `[ci skip]` to avoid triggering app-stacks pipeline
   - NOTE: The postmortem-todos Woodpecker pipeline WILL trigger (it has its own path filter)

## Type Reference for Prevention Plan

| Type | Auto-implementable? | Examples |
|------|---------------------|----------|
| Alert | Yes | Add PrometheusRule, modify alert thresholds |
| Config | Yes | Change Terraform variables, mount options, CronJob schedules |
| Monitor | Yes | Add Uptime Kuma HTTP/TCP monitor |
| Architecture | No | Migrate storage class, redesign HA topology |
| Investigation | No | Research kernel bug, check Proxmox forum |
| Runbook | No | Document recovery procedure |
| Migration | No | Move data between storage backends |
feat: post-mortem automation pipeline E2E workflow for incident post-mortems: 1. /post-mortem skill generates structured post-mortem markdown 2. Woodpecker pipeline triggers on docs/post-mortems/*.md changes 3. parse-postmortem-todos.sh extracts safe TODOs (Alert/Config/Monitor) 4. postmortem-todo-resolver agent implements TODOs headlessly 5. Agent updates post-mortem with Follow-up Implementation table Components: - .claude/skills/post-mortem/ — writer skill + template - .claude/agents/postmortem-todo-resolver.md — headless agent - .woodpecker/postmortem-todos.yml — CI pipeline - scripts/parse-postmortem-todos.sh — TODO extractor - cluster-health skill — auto-suggest post-mortem after recovery Safety: only auto-implements Alert/Config/Monitor types. Architecture/Migration/Investigation items are skipped. [ci skip] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-04-14 15:34:42 +00:00			`# Post-Mortem Writer`

			`Generate a structured post-mortem document after an incident mitigation session.`

			`## When to use`
			- After `/post-mortem` command
			`- Auto-suggested when cluster health transitions from UNHEALTHY → HEALTHY`

			`## Instructions`

			`1. Gather context:`
			- Run `.claude/scripts/sev-context.sh` to capture current cluster state
			`- Review the conversation history for: what broke, timeline, root cause, what was fixed`
			- Check existing post-mortems at `docs/post-mortems/` for format reference

			`2. Generate the post-mortem:`
			- Use the template at `.claude/skills/post-mortem/template.md`
			`- Fill in all sections from the investigation context`
			- Critical: In the Prevention Plan tables, set the `Type` column correctly:
			- `Alert` — add/modify Prometheus alerting rules (auto-implementable)
			- `Config` — change Terraform config, NFS options, etc. (auto-implementable)
			- `Monitor` — add Uptime Kuma monitors (auto-implementable)
			- `Architecture` — storage migration, stack redesign (human-only)
			- `Investigation` — needs further research (human-only)
			- `Runbook` — document a procedure (human-only)
			- `Migration` — data or service migration (human-only)
			- Items already fixed during the session should have Status = `Done`
			- Items not yet done should have Status = `TODO`

			3. File naming: `docs/post-mortems/<YYYY-MM-DD>-<slug>.md`
			`- Slug: lowercase, hyphenated, max 5 words describing the incident`

			4. Update index: Add an entry to `docs/post-mortems/index.html`
			`- Add a new card in the incidents grid with date, severity tag, title, description`

			`5. Commit and push:`
			```
			`git add docs/post-mortems/<file>.md docs/post-mortems/index.html`
			`git commit -m "docs: post-mortem for <date> <title> [ci skip]"`
			`git push origin master`
			```
			- Use `[ci skip]` to avoid triggering app-stacks pipeline
			`- NOTE: The postmortem-todos Woodpecker pipeline WILL trigger (it has its own path filter)`

			`## Type Reference for Prevention Plan`

			`\| Type \| Auto-implementable? \| Examples \|`
			`\|------\|---------------------\|----------\|`
			`\| Alert \| Yes \| Add PrometheusRule, modify alert thresholds \|`
			`\| Config \| Yes \| Change Terraform variables, mount options, CronJob schedules \|`
			`\| Monitor \| Yes \| Add Uptime Kuma HTTP/TCP monitor \|`
			`\| Architecture \| No \| Migrate storage class, redesign HA topology \|`
			`\| Investigation \| No \| Research kernel bug, check Proxmox forum \|`
			`\| Runbook \| No \| Document recovery procedure \|`
			`\| Migration \| No \| Move data between storage backends \|`