[ci skip] claudeception: extract 2 skills from today's session

1. sops-age-secrets-migration: Complete guide for migrating from git-crypt
   to SOPS+age. Covers JSON format requirement, race condition avoidance,
   CI integration, complex types, and migration sequence.

2. iterative-plan-review-with-subagents: Design pattern for reviewing plans
   with parallel security + implementation subagents. 2-3 iterations to
   zero CRITICALs. Used successfully for the SOPS migration design.
This commit is contained in:
Viktor Barzin 2026-03-07 15:46:36 +00:00
parent 9f2ac0fd1a
commit 7cc7991ce6
2 changed files with 196 additions and 0 deletions

View file

@ -0,0 +1,80 @@
---
name: iterative-plan-review-with-subagents
description: |
Design pattern for reviewing implementation plans using parallel subagent reviewers
with iterative refinement. Use when: (1) designing a complex infrastructure change
that needs security + implementation review, (2) creating a migration plan with
multiple phases, (3) any plan where missing a critical issue could cause data loss
or security exposure. Spawns 2 reviewer agents (security + implementation), collects
CRITICAL/IMPORTANT/NIT findings, fixes all CRITICALs, re-runs until zero CRITICALs.
Typically converges in 2-3 iterations.
author: Claude Code
version: 1.0.0
date: 2026-03-07
---
# Iterative Plan Review with Subagents
## Problem
Complex infrastructure plans have blind spots — security issues, implementation
incompatibilities, race conditions, format mismatches. A single reviewer misses things.
Multiple reviewers with different expertise catch more.
## Context / Trigger Conditions
- Writing a migration plan (e.g., secrets management, storage migration)
- Designing a multi-phase infrastructure change
- Any plan where a missed issue = downtime, data loss, or security exposure
- User explicitly asks for plan review
## Solution
### 1. Write the plan as a markdown document
Save to `docs/plans/YYYY-MM-DD-<topic>.md`
### 2. Spawn 2 reviewer agents in parallel
```
Agent 1: Security reviewer
- Focus: secret exposure, access control, key management, CI pipeline security
- Classify each finding: CRITICAL / IMPORTANT / NIT
Agent 2: Implementation reviewer
- Focus: format compatibility, race conditions, ordering, tool behavior
- Classify each finding: CRITICAL / IMPORTANT / NIT
```
Key: give each reviewer specific focus areas and the actual source code to check against.
### 3. Consolidate and fix CRITICALs
- Merge findings from both reviewers
- Deduplicate (both often find the same issue)
- Fix ALL CRITICALs in the plan document
- Note IMPORTANTs for implementation phase
### 4. Re-run reviewers on the updated plan
- Same 2 agents, but tell them which CRITICALs were fixed
- Ask them to VERIFY fixes are correct AND find new issues
- Repeat until zero CRITICALs
### 5. Typical convergence
- v1: 5-6 CRITICALs (format issues, race conditions, missing steps)
- v2: 2-3 CRITICALs (fixes introduced new issues, missed edge cases)
- v3: 0 CRITICALs, only IMPORTANTs remaining
## Example Findings from Real Usage (SOPS migration)
| Iteration | CRITICALs Found | Examples |
|-----------|----------------|---------|
| v1 | 6 | YAML≠HCL format, `git add .` commits secrets, no branch protection, parallel race condition |
| v2 | 3 | `SOPS_AGE_KEY_FILE` misunderstanding, `renew-tls.yml` not updated, plan leaks in PR logs |
| v3 | 0 | All verified fixed. 6 IMPORTANTs noted for implementation. |
## Verification
- Zero CRITICALs from both reviewers on the final iteration
- IMPORTANTs documented as implementation notes (not blockers)
## Notes
- Use `sonnet` model for reviewers (fast, thorough enough for review)
- Give reviewers actual source code paths to read, not just the plan
- Tell v2+ reviewers what was fixed so they verify, not re-discover
- The final review should say "ONLY report CRITICALs" to avoid noise
- This pattern cost ~$3-5 in API calls but caught issues that would have caused hours of debugging

View file

@ -0,0 +1,116 @@
---
name: sops-age-secrets-migration
description: |
Migrate from git-crypt to SOPS + age for multi-user secret management in a
Terraform/Terragrunt infrastructure repo. Use when: (1) need per-user secret
access control (git-crypt is all-or-nothing), (2) want operators to push PRs
without seeing secrets (CI decrypts), (3) migrating from a single encrypted
terraform.tfvars to structured secret management. Covers: JSON format (not YAML
— Terraform can't parse YAML tfvars), race condition avoidance with parallel
terragrunt applies, CI pipeline integration with Woodpecker, age key management,
and the complete migration sequence.
author: Claude Code
version: 1.0.0
date: 2026-03-07
---
# SOPS + age Secrets Migration from git-crypt
## Problem
git-crypt encrypts entire files — anyone with the key decrypts everything. For multi-user
setups where operators should push code without seeing secrets, you need per-value encryption
with CI-only decryption.
## Context / Trigger Conditions
- Single `terraform.tfvars` encrypted with git-crypt containing 100+ secrets
- Need to onboard operators who shouldn't see API keys, passwords, SSH keys
- Want GitOps (secrets in git) but with access control
- Terraform/Terragrunt stack-per-service architecture
## Solution
### 1. Use JSON, not YAML
SOPS outputs the same format as input. `sops -d file.yaml` → YAML. `sops -d file.json` → JSON.
Terraform natively supports `*.auto.tfvars.json` files. YAML is NOT valid HCL.
```
secrets.sops.json → sops -d → secrets.auto.tfvars.json → Terraform reads it
```
### 2. Split tfvars into config + secrets
```
config.tfvars ← plaintext (hostnames, IPs, DNS records)
secrets.sops.json ← SOPS-encrypted (passwords, tokens, keys)
```
### 3. Global decrypt, not per-stack hooks
**CRITICAL**: Do NOT use `before_hook`/`after_hook` for decryption. With `terragrunt run --all`,
70+ stacks run hooks in parallel, all writing to the same output file — race condition.
Instead, use a wrapper script that decrypts once:
```bash
#!/usr/bin/env bash
# scripts/tg — decrypt then terragrunt
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
if [ ! -f "$REPO_ROOT/secrets.auto.tfvars.json" ] || \
[ "$REPO_ROOT/secrets.sops.json" -nt "$REPO_ROOT/secrets.auto.tfvars.json" ]; then
sops -d "$REPO_ROOT/secrets.sops.json" > "$REPO_ROOT/secrets.auto.tfvars.json"
fi
exec terragrunt "$@"
```
### 4. Terragrunt loads both (backward compatible)
```hcl
terraform {
extra_arguments "common_vars" {
commands = get_terraform_commands_that_need_vars()
required_var_files = ["${get_repo_root()}/config.tfvars"]
optional_var_files = [
"${get_repo_root()}/terraform.tfvars", # legacy (git-crypt)
"${get_repo_root()}/secrets.auto.tfvars.json" # new (SOPS)
]
}
before_hook "check_secrets" {
commands = ["apply", "plan", "destroy"]
execute = ["test", "-f", "${get_repo_root()}/secrets.auto.tfvars.json"]
}
}
```
### 5. Complex types work in JSON
Maps, lists, nested objects, multiline strings (SSH keys as `\n`-escaped) all work:
```json
{
"simple_password": "abc123",
"mailserver_accounts": {"user@domain": "pass"},
"ssh_key": "-----BEGIN OPENSSH PRIVATE KEY-----\nb3Blbn...\n-----END OPENSSH PRIVATE KEY-----\n"
}
```
### 6. CI integration (Woodpecker)
- Store age private key as CI secret (`SOPS_AGE_KEY`)
- Write to temp file for `SOPS_AGE_KEY_FILE` (Woodpecker `from_secret` only does env vars)
- `git add stacks/ state/ .woodpecker/` — NEVER `git add .`
- Cleanup step with `status: [success, failure]`
## Verification
```bash
# Encrypt
sops -e -i secrets.sops.json
# Decrypt and verify
sops -d secrets.sops.json | jq .
# Verify SSH keys
sops -d secrets.sops.json | jq -r '.ssh_key' | ssh-keygen -l -f -
# Test with terragrunt
scripts/tg validate
```
## Notes
- Keep git-crypt for binary files (TLS certs, deploy keys) — SOPS can't encrypt binary
- `sensitive = true` on all secret variable declarations — prevents plan output leaks
- Don't add `sensitive = true` to non-secret variables with "secret" in the name (e.g., `tls_secret_name`, `ingress_path`) — breaks `for_each` on lists
- Age keys are one line — much simpler than GPG
- `.sops.yaml` path_regex should be anchored: `^secrets\.sops\.json$`