infra/.claude/skills/setup-project/scripts/stability-gate.sh

72 lines
1.7 KiB
Bash
Raw Normal View History

feat(setup-project): auto-PR working Dockerfiles back to upstream ## Context The setup-project skill treats "build from a Dockerfile" as priority 6 — "last resort, avoid if possible" — with no formalized path for apps whose upstream lacks a working Dockerfile. When we end up writing one to get the deploy green, that Dockerfile stays private in the infra repo and upstream never benefits. ## This change Adds a closed-loop flow: when we author a new Dockerfile (or fix a broken upstream one) and the deploy is healthy for 10 minutes, auto-open a PR against the upstream repo so the self-hosting community gets the working recipe. Flow: 1. Classify dockerfile_state during research phase (image-used / used-as-is / fixed-broken-upstream / written-from-scratch). Persist to modules/kubernetes/<service>/.contribution-state.json. 2. After Terraform apply, run scripts/stability-gate.sh — polls pod Ready + HTTP 200 every 30s x 20 iterations, requires 18/20 successes. 3. On pass with a trigger state, scripts/contribute-dockerfile.sh does the GitHub API dance: fork → merge-upstream → branch → commit Dockerfile / .dockerignore / BUILD.md via Contents API → open PR with body rendered from templates/PR_BODY.md. Idempotent (skips on recorded PR URL, existing fork, existing branch, open PR, upstream landed a Dockerfile mid-deploy). GitHub API via curl (gh CLI is sandbox-blocked per .claude/CLAUDE.md); token pulled from Vault (`secret/viktor` → `github_pat`). Commits include Signed-off-by for DCO-enforcing repos. Fork branch name is `add-dockerfile` for written-from-scratch or `fix-dockerfile` for fixed-broken-upstream, with timestamp suffix on collision. ## Files - SKILL.md — state classification table, quality bar checklist, §8b stability gate, §10 contribute-upstream step, checklist updates - scripts/stability-gate.sh — 10-minute health probe - scripts/contribute-dockerfile.sh — GitHub API orchestrator - templates/PR_BODY.md — `{{VAR}}` placeholder template for PR description - templates/Dockerfile.README.md — BUILD.md template shipped with the PR ## What is NOT in this change - No Woodpecker / GHA changes (skill-local flow). - No auto-tracking of merge/reject outcomes upstream (manual follow-up). - Not yet exercised end-to-end; first real-world run will validate the API dance. Plan to dry-run against a throwaway sink repo before pointing at a real upstream. ## Test Plan ### Automated - bash -n on both scripts → pass - Manual read-through of SKILL.md — step numbering coherent, existing §1-9 untouched semantics, new §8b/§10 reference real files ### Manual Verification 1. Next time setup-project onboards a Dockerfile-less app: - Confirm .contribution-state.json is written with `written-from-scratch` - Run stability-gate.sh — expect 18/20 passes on a healthy deploy - Run contribute-dockerfile.sh — expect a fork + branch + PR on ViktorBarzin - Verify contribution_pr_url is back-written to the state file 2. Re-run contribute-dockerfile.sh → must be a no-op (idempotent) 3. Upstream-archived case: manually archive a test upstream → re-run → expect SKIP, no PR created [ci skip] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 18:12:13 +00:00
#!/usr/bin/env bash
# 10-minute deploy stability gate for setup-project skill.
# Polls pod readiness + HTTP 200 on target URL every 30s for 20 iterations.
# Requires 18/20 probes to succeed (tolerates 2 blips for restarts/DNS propagation).
#
# Usage:
# stability-gate.sh <namespace> <app-label> <url>
#
# Example:
# stability-gate.sh myapp myapp https://myapp.viktorbarzin.me
#
# Exit codes:
# 0 - Stable (>=18/20 probes OK)
# 1 - Unstable (<18/20 probes OK)
# 2 - Usage error
set -u
if [ "$#" -ne 3 ]; then
echo "Usage: $0 <namespace> <app-label> <url>" >&2
exit 2
fi
NS="$1"
APP="$2"
URL="$3"
TOTAL_PROBES=20
MIN_SUCCESSES=18
INTERVAL_SECONDS=30
ok_count=0
fail_count=0
echo "stability-gate: ns=$NS app=$APP url=$URL"
echo "stability-gate: $TOTAL_PROBES probes x ${INTERVAL_SECONDS}s (need $MIN_SUCCESSES/$TOTAL_PROBES)"
for i in $(seq 1 "$TOTAL_PROBES"); do
probe_ok=true
if ! kubectl wait --for=condition=Ready pod -l "app=$APP" -n "$NS" --timeout=25s >/dev/null 2>&1; then
probe_ok=false
fi
status=$(curl -sS -o /dev/null -w "%{http_code}" --max-time 10 "$URL" || echo "000")
if [ "$status" != "200" ]; then
probe_ok=false
fi
if [ "$probe_ok" = "true" ]; then
ok_count=$((ok_count + 1))
printf " probe %2d/%d: OK (http=%s)\n" "$i" "$TOTAL_PROBES" "$status"
else
fail_count=$((fail_count + 1))
printf " probe %2d/%d: FAIL (http=%s)\n" "$i" "$TOTAL_PROBES" "$status"
fi
if [ "$i" -lt "$TOTAL_PROBES" ]; then
sleep "$INTERVAL_SECONDS"
fi
done
echo "stability-gate: results ok=$ok_count fail=$fail_count"
if [ "$ok_count" -ge "$MIN_SUCCESSES" ]; then
echo "stability-gate: PASS"
exit 0
fi
echo "stability-gate: FAIL (need $MIN_SUCCESSES, got $ok_count)" >&2
exit 1