From 7cc7991ce6b2b6365fd5724c7605b8aeff11b879 Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Sat, 7 Mar 2026 15:46:36 +0000 Subject: [PATCH] [ci skip] claudeception: extract 2 skills from today's session 1. sops-age-secrets-migration: Complete guide for migrating from git-crypt to SOPS+age. Covers JSON format requirement, race condition avoidance, CI integration, complex types, and migration sequence. 2. iterative-plan-review-with-subagents: Design pattern for reviewing plans with parallel security + implementation subagents. 2-3 iterations to zero CRITICALs. Used successfully for the SOPS migration design. --- .../SKILL.md | 80 ++++++++++++ .../sops-age-secrets-migration/SKILL.md | 116 ++++++++++++++++++ 2 files changed, 196 insertions(+) create mode 100644 .claude/skills/archived/iterative-plan-review-with-subagents/SKILL.md create mode 100644 .claude/skills/archived/sops-age-secrets-migration/SKILL.md diff --git a/.claude/skills/archived/iterative-plan-review-with-subagents/SKILL.md b/.claude/skills/archived/iterative-plan-review-with-subagents/SKILL.md new file mode 100644 index 00000000..6df5d3ef --- /dev/null +++ b/.claude/skills/archived/iterative-plan-review-with-subagents/SKILL.md @@ -0,0 +1,80 @@ +--- +name: iterative-plan-review-with-subagents +description: | + Design pattern for reviewing implementation plans using parallel subagent reviewers + with iterative refinement. Use when: (1) designing a complex infrastructure change + that needs security + implementation review, (2) creating a migration plan with + multiple phases, (3) any plan where missing a critical issue could cause data loss + or security exposure. Spawns 2 reviewer agents (security + implementation), collects + CRITICAL/IMPORTANT/NIT findings, fixes all CRITICALs, re-runs until zero CRITICALs. + Typically converges in 2-3 iterations. +author: Claude Code +version: 1.0.0 +date: 2026-03-07 +--- + +# Iterative Plan Review with Subagents + +## Problem +Complex infrastructure plans have blind spots — security issues, implementation +incompatibilities, race conditions, format mismatches. A single reviewer misses things. +Multiple reviewers with different expertise catch more. + +## Context / Trigger Conditions +- Writing a migration plan (e.g., secrets management, storage migration) +- Designing a multi-phase infrastructure change +- Any plan where a missed issue = downtime, data loss, or security exposure +- User explicitly asks for plan review + +## Solution + +### 1. Write the plan as a markdown document +Save to `docs/plans/YYYY-MM-DD-.md` + +### 2. Spawn 2 reviewer agents in parallel +``` +Agent 1: Security reviewer +- Focus: secret exposure, access control, key management, CI pipeline security +- Classify each finding: CRITICAL / IMPORTANT / NIT + +Agent 2: Implementation reviewer +- Focus: format compatibility, race conditions, ordering, tool behavior +- Classify each finding: CRITICAL / IMPORTANT / NIT +``` + +Key: give each reviewer specific focus areas and the actual source code to check against. + +### 3. Consolidate and fix CRITICALs +- Merge findings from both reviewers +- Deduplicate (both often find the same issue) +- Fix ALL CRITICALs in the plan document +- Note IMPORTANTs for implementation phase + +### 4. Re-run reviewers on the updated plan +- Same 2 agents, but tell them which CRITICALs were fixed +- Ask them to VERIFY fixes are correct AND find new issues +- Repeat until zero CRITICALs + +### 5. Typical convergence +- v1: 5-6 CRITICALs (format issues, race conditions, missing steps) +- v2: 2-3 CRITICALs (fixes introduced new issues, missed edge cases) +- v3: 0 CRITICALs, only IMPORTANTs remaining + +## Example Findings from Real Usage (SOPS migration) + +| Iteration | CRITICALs Found | Examples | +|-----------|----------------|---------| +| v1 | 6 | YAML≠HCL format, `git add .` commits secrets, no branch protection, parallel race condition | +| v2 | 3 | `SOPS_AGE_KEY_FILE` misunderstanding, `renew-tls.yml` not updated, plan leaks in PR logs | +| v3 | 0 | All verified fixed. 6 IMPORTANTs noted for implementation. | + +## Verification +- Zero CRITICALs from both reviewers on the final iteration +- IMPORTANTs documented as implementation notes (not blockers) + +## Notes +- Use `sonnet` model for reviewers (fast, thorough enough for review) +- Give reviewers actual source code paths to read, not just the plan +- Tell v2+ reviewers what was fixed so they verify, not re-discover +- The final review should say "ONLY report CRITICALs" to avoid noise +- This pattern cost ~$3-5 in API calls but caught issues that would have caused hours of debugging diff --git a/.claude/skills/archived/sops-age-secrets-migration/SKILL.md b/.claude/skills/archived/sops-age-secrets-migration/SKILL.md new file mode 100644 index 00000000..814ce939 --- /dev/null +++ b/.claude/skills/archived/sops-age-secrets-migration/SKILL.md @@ -0,0 +1,116 @@ +--- +name: sops-age-secrets-migration +description: | + Migrate from git-crypt to SOPS + age for multi-user secret management in a + Terraform/Terragrunt infrastructure repo. Use when: (1) need per-user secret + access control (git-crypt is all-or-nothing), (2) want operators to push PRs + without seeing secrets (CI decrypts), (3) migrating from a single encrypted + terraform.tfvars to structured secret management. Covers: JSON format (not YAML + — Terraform can't parse YAML tfvars), race condition avoidance with parallel + terragrunt applies, CI pipeline integration with Woodpecker, age key management, + and the complete migration sequence. +author: Claude Code +version: 1.0.0 +date: 2026-03-07 +--- + +# SOPS + age Secrets Migration from git-crypt + +## Problem +git-crypt encrypts entire files — anyone with the key decrypts everything. For multi-user +setups where operators should push code without seeing secrets, you need per-value encryption +with CI-only decryption. + +## Context / Trigger Conditions +- Single `terraform.tfvars` encrypted with git-crypt containing 100+ secrets +- Need to onboard operators who shouldn't see API keys, passwords, SSH keys +- Want GitOps (secrets in git) but with access control +- Terraform/Terragrunt stack-per-service architecture + +## Solution + +### 1. Use JSON, not YAML +SOPS outputs the same format as input. `sops -d file.yaml` → YAML. `sops -d file.json` → JSON. +Terraform natively supports `*.auto.tfvars.json` files. YAML is NOT valid HCL. + +``` +secrets.sops.json → sops -d → secrets.auto.tfvars.json → Terraform reads it +``` + +### 2. Split tfvars into config + secrets +``` +config.tfvars ← plaintext (hostnames, IPs, DNS records) +secrets.sops.json ← SOPS-encrypted (passwords, tokens, keys) +``` + +### 3. Global decrypt, not per-stack hooks +**CRITICAL**: Do NOT use `before_hook`/`after_hook` for decryption. With `terragrunt run --all`, +70+ stacks run hooks in parallel, all writing to the same output file — race condition. + +Instead, use a wrapper script that decrypts once: +```bash +#!/usr/bin/env bash +# scripts/tg — decrypt then terragrunt +REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)" +if [ ! -f "$REPO_ROOT/secrets.auto.tfvars.json" ] || \ + [ "$REPO_ROOT/secrets.sops.json" -nt "$REPO_ROOT/secrets.auto.tfvars.json" ]; then + sops -d "$REPO_ROOT/secrets.sops.json" > "$REPO_ROOT/secrets.auto.tfvars.json" +fi +exec terragrunt "$@" +``` + +### 4. Terragrunt loads both (backward compatible) +```hcl +terraform { + extra_arguments "common_vars" { + commands = get_terraform_commands_that_need_vars() + required_var_files = ["${get_repo_root()}/config.tfvars"] + optional_var_files = [ + "${get_repo_root()}/terraform.tfvars", # legacy (git-crypt) + "${get_repo_root()}/secrets.auto.tfvars.json" # new (SOPS) + ] + } + before_hook "check_secrets" { + commands = ["apply", "plan", "destroy"] + execute = ["test", "-f", "${get_repo_root()}/secrets.auto.tfvars.json"] + } +} +``` + +### 5. Complex types work in JSON +Maps, lists, nested objects, multiline strings (SSH keys as `\n`-escaped) all work: +```json +{ + "simple_password": "abc123", + "mailserver_accounts": {"user@domain": "pass"}, + "ssh_key": "-----BEGIN OPENSSH PRIVATE KEY-----\nb3Blbn...\n-----END OPENSSH PRIVATE KEY-----\n" +} +``` + +### 6. CI integration (Woodpecker) +- Store age private key as CI secret (`SOPS_AGE_KEY`) +- Write to temp file for `SOPS_AGE_KEY_FILE` (Woodpecker `from_secret` only does env vars) +- `git add stacks/ state/ .woodpecker/` — NEVER `git add .` +- Cleanup step with `status: [success, failure]` + +## Verification +```bash +# Encrypt +sops -e -i secrets.sops.json + +# Decrypt and verify +sops -d secrets.sops.json | jq . + +# Verify SSH keys +sops -d secrets.sops.json | jq -r '.ssh_key' | ssh-keygen -l -f - + +# Test with terragrunt +scripts/tg validate +``` + +## Notes +- Keep git-crypt for binary files (TLS certs, deploy keys) — SOPS can't encrypt binary +- `sensitive = true` on all secret variable declarations — prevents plan output leaks +- Don't add `sensitive = true` to non-secret variables with "secret" in the name (e.g., `tls_secret_name`, `ingress_path`) — breaks `for_each` on lists +- Age keys are one line — much simpler than GPG +- `.sops.yaml` path_regex should be anchored: `^secrets\.sops\.json$`