## Context
The setup-project skill treats "build from a Dockerfile" as priority 6 — "last
resort, avoid if possible" — with no formalized path for apps whose upstream
lacks a working Dockerfile. When we end up writing one to get the deploy green,
that Dockerfile stays private in the infra repo and upstream never benefits.
## This change
Adds a closed-loop flow: when we author a new Dockerfile (or fix a broken
upstream one) and the deploy is healthy for 10 minutes, auto-open a PR against
the upstream repo so the self-hosting community gets the working recipe.
Flow:
1. Classify dockerfile_state during research phase (image-used / used-as-is /
fixed-broken-upstream / written-from-scratch). Persist to
modules/kubernetes/<service>/.contribution-state.json.
2. After Terraform apply, run scripts/stability-gate.sh — polls pod Ready +
HTTP 200 every 30s x 20 iterations, requires 18/20 successes.
3. On pass with a trigger state, scripts/contribute-dockerfile.sh does the
GitHub API dance: fork → merge-upstream → branch → commit Dockerfile /
.dockerignore / BUILD.md via Contents API → open PR with body rendered from
templates/PR_BODY.md. Idempotent (skips on recorded PR URL, existing fork,
existing branch, open PR, upstream landed a Dockerfile mid-deploy).
GitHub API via curl (gh CLI is sandbox-blocked per .claude/CLAUDE.md); token
pulled from Vault (`secret/viktor` → `github_pat`). Commits include
Signed-off-by for DCO-enforcing repos. Fork branch name is `add-dockerfile`
for written-from-scratch or `fix-dockerfile` for fixed-broken-upstream, with
timestamp suffix on collision.
## Files
- SKILL.md — state classification table, quality bar checklist, §8b stability
gate, §10 contribute-upstream step, checklist updates
- scripts/stability-gate.sh — 10-minute health probe
- scripts/contribute-dockerfile.sh — GitHub API orchestrator
- templates/PR_BODY.md — `{{VAR}}` placeholder template for PR description
- templates/Dockerfile.README.md — BUILD.md template shipped with the PR
## What is NOT in this change
- No Woodpecker / GHA changes (skill-local flow).
- No auto-tracking of merge/reject outcomes upstream (manual follow-up).
- Not yet exercised end-to-end; first real-world run will validate the API
dance. Plan to dry-run against a throwaway sink repo before pointing at a
real upstream.
## Test Plan
### Automated
- bash -n on both scripts → pass
- Manual read-through of SKILL.md — step numbering coherent, existing
§1-9 untouched semantics, new §8b/§10 reference real files
### Manual Verification
1. Next time setup-project onboards a Dockerfile-less app:
- Confirm .contribution-state.json is written with `written-from-scratch`
- Run stability-gate.sh — expect 18/20 passes on a healthy deploy
- Run contribute-dockerfile.sh — expect a fork + branch + PR on ViktorBarzin
- Verify contribution_pr_url is back-written to the state file
2. Re-run contribute-dockerfile.sh → must be a no-op (idempotent)
3. Upstream-archived case: manually archive a test upstream → re-run →
expect SKIP, no PR created
[ci skip]
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
71 lines
1.7 KiB
Bash
Executable file
71 lines
1.7 KiB
Bash
Executable file
#!/usr/bin/env bash
|
|
# 10-minute deploy stability gate for setup-project skill.
|
|
# Polls pod readiness + HTTP 200 on target URL every 30s for 20 iterations.
|
|
# Requires 18/20 probes to succeed (tolerates 2 blips for restarts/DNS propagation).
|
|
#
|
|
# Usage:
|
|
# stability-gate.sh <namespace> <app-label> <url>
|
|
#
|
|
# Example:
|
|
# stability-gate.sh myapp myapp https://myapp.viktorbarzin.me
|
|
#
|
|
# Exit codes:
|
|
# 0 - Stable (>=18/20 probes OK)
|
|
# 1 - Unstable (<18/20 probes OK)
|
|
# 2 - Usage error
|
|
|
|
set -u
|
|
|
|
if [ "$#" -ne 3 ]; then
|
|
echo "Usage: $0 <namespace> <app-label> <url>" >&2
|
|
exit 2
|
|
fi
|
|
|
|
NS="$1"
|
|
APP="$2"
|
|
URL="$3"
|
|
|
|
TOTAL_PROBES=20
|
|
MIN_SUCCESSES=18
|
|
INTERVAL_SECONDS=30
|
|
|
|
ok_count=0
|
|
fail_count=0
|
|
|
|
echo "stability-gate: ns=$NS app=$APP url=$URL"
|
|
echo "stability-gate: $TOTAL_PROBES probes x ${INTERVAL_SECONDS}s (need $MIN_SUCCESSES/$TOTAL_PROBES)"
|
|
|
|
for i in $(seq 1 "$TOTAL_PROBES"); do
|
|
probe_ok=true
|
|
|
|
if ! kubectl wait --for=condition=Ready pod -l "app=$APP" -n "$NS" --timeout=25s >/dev/null 2>&1; then
|
|
probe_ok=false
|
|
fi
|
|
|
|
status=$(curl -sS -o /dev/null -w "%{http_code}" --max-time 10 "$URL" || echo "000")
|
|
if [ "$status" != "200" ]; then
|
|
probe_ok=false
|
|
fi
|
|
|
|
if [ "$probe_ok" = "true" ]; then
|
|
ok_count=$((ok_count + 1))
|
|
printf " probe %2d/%d: OK (http=%s)\n" "$i" "$TOTAL_PROBES" "$status"
|
|
else
|
|
fail_count=$((fail_count + 1))
|
|
printf " probe %2d/%d: FAIL (http=%s)\n" "$i" "$TOTAL_PROBES" "$status"
|
|
fi
|
|
|
|
if [ "$i" -lt "$TOTAL_PROBES" ]; then
|
|
sleep "$INTERVAL_SECONDS"
|
|
fi
|
|
done
|
|
|
|
echo "stability-gate: results ok=$ok_count fail=$fail_count"
|
|
|
|
if [ "$ok_count" -ge "$MIN_SUCCESSES" ]; then
|
|
echo "stability-gate: PASS"
|
|
exit 0
|
|
fi
|
|
|
|
echo "stability-gate: FAIL (need $MIN_SUCCESSES, got $ok_count)" >&2
|
|
exit 1
|