[ci skip] claudeception: extract 2 skills from today's session

1. sops-age-secrets-migration: Complete guide for migrating from git-crypt to SOPS+age. Covers JSON format requirement, race condition avoidance, CI integration, complex types, and migration sequence. 2. iterative-plan-review-with-subagents: Design pattern for reviewing plans with parallel security + implementation subagents. 2-3 iterations to zero CRITICALs. Used successfully for the SOPS migration design.
2026-03-07 15:46:36 +00:00 · 2026-03-07 15:46:36 +00:00 · 7cc7991ce6
commit 7cc7991ce6
parent 9f2ac0fd1a
2 changed files with 196 additions and 0 deletions
--- a/.claude/skills/archived/iterative-plan-review-with-subagents/SKILL.md
+++ b/.claude/skills/archived/iterative-plan-review-with-subagents/SKILL.md
@ -0,0 +1,80 @@
+---
+name: iterative-plan-review-with-subagents
+description: |
+  Design pattern for reviewing implementation plans using parallel subagent reviewers
+  with iterative refinement. Use when: (1) designing a complex infrastructure change
+  that needs security + implementation review, (2) creating a migration plan with
+  multiple phases, (3) any plan where missing a critical issue could cause data loss
+  or security exposure. Spawns 2 reviewer agents (security + implementation), collects
+  CRITICAL/IMPORTANT/NIT findings, fixes all CRITICALs, re-runs until zero CRITICALs.
+  Typically converges in 2-3 iterations.
+author: Claude Code
+version: 1.0.0
+date: 2026-03-07
+---
+
+# Iterative Plan Review with Subagents
+
+## Problem
+Complex infrastructure plans have blind spots — security issues, implementation
+incompatibilities, race conditions, format mismatches. A single reviewer misses things.
+Multiple reviewers with different expertise catch more.
+
+## Context / Trigger Conditions
+- Writing a migration plan (e.g., secrets management, storage migration)
+- Designing a multi-phase infrastructure change
+- Any plan where a missed issue = downtime, data loss, or security exposure
+- User explicitly asks for plan review
+
+## Solution
+
+### 1. Write the plan as a markdown document
+Save to `docs/plans/YYYY-MM-DD-<topic>.md`
+
+### 2. Spawn 2 reviewer agents in parallel
+```
+Agent 1: Security reviewer
+- Focus: secret exposure, access control, key management, CI pipeline security
+- Classify each finding: CRITICAL / IMPORTANT / NIT
+
+Agent 2: Implementation reviewer
+- Focus: format compatibility, race conditions, ordering, tool behavior
+- Classify each finding: CRITICAL / IMPORTANT / NIT
+```
+
+Key: give each reviewer specific focus areas and the actual source code to check against.
+
+### 3. Consolidate and fix CRITICALs
+- Merge findings from both reviewers
+- Deduplicate (both often find the same issue)
+- Fix ALL CRITICALs in the plan document
+- Note IMPORTANTs for implementation phase
+
+### 4. Re-run reviewers on the updated plan
+- Same 2 agents, but tell them which CRITICALs were fixed
+- Ask them to VERIFY fixes are correct AND find new issues
+- Repeat until zero CRITICALs
+
+### 5. Typical convergence
+- v1: 5-6 CRITICALs (format issues, race conditions, missing steps)
+- v2: 2-3 CRITICALs (fixes introduced new issues, missed edge cases)
+- v3: 0 CRITICALs, only IMPORTANTs remaining
+
+## Example Findings from Real Usage (SOPS migration)
+
+| Iteration | CRITICALs Found | Examples |
+|-----------|----------------|---------|
+| v1 | 6 | YAML≠HCL format, `git add .` commits secrets, no branch protection, parallel race condition |
+| v2 | 3 | `SOPS_AGE_KEY_FILE` misunderstanding, `renew-tls.yml` not updated, plan leaks in PR logs |
+| v3 | 0 | All verified fixed. 6 IMPORTANTs noted for implementation. |
+
+## Verification
+- Zero CRITICALs from both reviewers on the final iteration
+- IMPORTANTs documented as implementation notes (not blockers)
+
+## Notes
+- Use `sonnet` model for reviewers (fast, thorough enough for review)
+- Give reviewers actual source code paths to read, not just the plan
+- Tell v2+ reviewers what was fixed so they verify, not re-discover
+- The final review should say "ONLY report CRITICALs" to avoid noise
+- This pattern cost ~$3-5 in API calls but caught issues that would have caused hours of debugging
--- a/.claude/skills/archived/sops-age-secrets-migration/SKILL.md
+++ b/.claude/skills/archived/sops-age-secrets-migration/SKILL.md
@ -0,0 +1,116 @@
+---
+name: sops-age-secrets-migration
+description: |
+  Migrate from git-crypt to SOPS + age for multi-user secret management in a
+  Terraform/Terragrunt infrastructure repo. Use when: (1) need per-user secret
+  access control (git-crypt is all-or-nothing), (2) want operators to push PRs
+  without seeing secrets (CI decrypts), (3) migrating from a single encrypted
+  terraform.tfvars to structured secret management. Covers: JSON format (not YAML
+  — Terraform can't parse YAML tfvars), race condition avoidance with parallel
+  terragrunt applies, CI pipeline integration with Woodpecker, age key management,
+  and the complete migration sequence.
+author: Claude Code
+version: 1.0.0
+date: 2026-03-07
+---
+
+# SOPS + age Secrets Migration from git-crypt
+
+## Problem
+git-crypt encrypts entire files — anyone with the key decrypts everything. For multi-user
+setups where operators should push code without seeing secrets, you need per-value encryption
+with CI-only decryption.
+
+## Context / Trigger Conditions
+- Single `terraform.tfvars` encrypted with git-crypt containing 100+ secrets
+- Need to onboard operators who shouldn't see API keys, passwords, SSH keys
+- Want GitOps (secrets in git) but with access control
+- Terraform/Terragrunt stack-per-service architecture
+
+## Solution
+
+### 1. Use JSON, not YAML
+SOPS outputs the same format as input. `sops -d file.yaml` → YAML. `sops -d file.json` → JSON.
+Terraform natively supports `*.auto.tfvars.json` files. YAML is NOT valid HCL.
+
+```
+secrets.sops.json → sops -d → secrets.auto.tfvars.json → Terraform reads it
+```
+
+### 2. Split tfvars into config + secrets
+```
+config.tfvars          ← plaintext (hostnames, IPs, DNS records)
+secrets.sops.json      ← SOPS-encrypted (passwords, tokens, keys)
+```
+
+### 3. Global decrypt, not per-stack hooks
+**CRITICAL**: Do NOT use `before_hook`/`after_hook` for decryption. With `terragrunt run --all`,
+70+ stacks run hooks in parallel, all writing to the same output file — race condition.
+
+Instead, use a wrapper script that decrypts once:
+```bash
+#!/usr/bin/env bash
+# scripts/tg — decrypt then terragrunt
+REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
+if [ ! -f "$REPO_ROOT/secrets.auto.tfvars.json" ] || \
+   [ "$REPO_ROOT/secrets.sops.json" -nt "$REPO_ROOT/secrets.auto.tfvars.json" ]; then
+  sops -d "$REPO_ROOT/secrets.sops.json" > "$REPO_ROOT/secrets.auto.tfvars.json"
+fi
+exec terragrunt "$@"
+```
+
+### 4. Terragrunt loads both (backward compatible)
+```hcl
+terraform {
+  extra_arguments "common_vars" {
+    commands = get_terraform_commands_that_need_vars()
+    required_var_files = ["${get_repo_root()}/config.tfvars"]
+    optional_var_files = [
+      "${get_repo_root()}/terraform.tfvars",        # legacy (git-crypt)
+      "${get_repo_root()}/secrets.auto.tfvars.json"  # new (SOPS)
+    ]
+  }
+  before_hook "check_secrets" {
+    commands = ["apply", "plan", "destroy"]
+    execute  = ["test", "-f", "${get_repo_root()}/secrets.auto.tfvars.json"]
+  }
+}
+```
+
+### 5. Complex types work in JSON
+Maps, lists, nested objects, multiline strings (SSH keys as `\n`-escaped) all work:
+```json
+{
+  "simple_password": "abc123",
+  "mailserver_accounts": {"user@domain": "pass"},
+  "ssh_key": "-----BEGIN OPENSSH PRIVATE KEY-----\nb3Blbn...\n-----END OPENSSH PRIVATE KEY-----\n"
+}
+```
+
+### 6. CI integration (Woodpecker)
+- Store age private key as CI secret (`SOPS_AGE_KEY`)
+- Write to temp file for `SOPS_AGE_KEY_FILE` (Woodpecker `from_secret` only does env vars)
+- `git add stacks/ state/ .woodpecker/` — NEVER `git add .`
+- Cleanup step with `status: [success, failure]`
+
+## Verification
+```bash
+# Encrypt
+sops -e -i secrets.sops.json
+
+# Decrypt and verify
+sops -d secrets.sops.json | jq .
+
+# Verify SSH keys
+sops -d secrets.sops.json | jq -r '.ssh_key' | ssh-keygen -l -f -
+
+# Test with terragrunt
+scripts/tg validate
+```
+
+## Notes
+- Keep git-crypt for binary files (TLS certs, deploy keys) — SOPS can't encrypt binary
+- `sensitive = true` on all secret variable declarations — prevents plan output leaks
+- Don't add `sensitive = true` to non-secret variables with "secret" in the name (e.g., `tls_secret_name`, `ingress_path`) — breaks `for_each` on lists
+- Age keys are one line — much simpler than GPG
+- `.sops.yaml` path_regex should be anchored: `^secrets\.sops\.json$`