diff --git a/AGENTS.md b/AGENTS.md index 797ed5df..f281fd33 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -289,6 +289,7 @@ curl -X POST -H "Authorization: token $TOK" -H 'Content-Type: application/json' ``` ## Common Operations +- **`homelab` CLI** (`/usr/local/bin/homelab`, source `cli/`): unified infra-ops verbs — run `homelab manifest` to discover the surface (each verb tagged read/write). v0.1 covers the inner loop: `homelab tf plan|fmt|apply ` (wraps `scripts/tg`; `apply` auto-claims presence + releases on exit, warns out-of-band), `homelab claim|release :`, `homelab work start|land|clean ` (worktree lifecycle; `land` gates on verification, `--verify-cmd`/`--no-verify`). Full docs: `cli/README.md`. - **Deploy new service**: Use `stacks//` as template. Create stack, add DNS in tfvars, apply platform then service. - **Fix crashed pods**: Run healthcheck first. Safe to delete evicted/failed pods and CrashLoopBackOff pods with >10 restarts. - **OOMKilled**: Check `kubectl describe limitrange tier-defaults -n `. Increase `resources.limits.memory` in the stack's main.tf. diff --git a/cli/README.md b/cli/README.md index 48b83c93..80ea0f52 100644 --- a/cli/README.md +++ b/cli/README.md @@ -1,2 +1,68 @@ -# What is this? -This is a CLI to manipulate files in the terraform repo and commit and push them +# homelab + +`homelab` is the unified, agent-facing CLI for operating this homelab — one +composable, JSON-capable surface for the operations agents run over and over, +discovered progressively at runtime. It is grown **in place** from this +directory (the former `infra-cli`), and the legacy webhook use-cases still work +(see below). + +It encodes *actions*, never *judgment*: methodology (debugging, TDD, review) and +third-party/owned MCP servers (e.g. phpIPAM) are deliberately out of scope. + +## Usage + +``` +homelab [args] +homelab manifest [--json] # list every verb + its read/write tier (discovery entrypoint) +homelab version +``` + +### v0.1 verbs — the infra inner-loop + +| Command | Tier | What it does | +|---|---|---| +| `claim : --purpose "…"` | write | claim a shared resource on the presence board (wraps `scripts/presence`) | +| `release :` | write | release a presence claim | +| `tf plan ` | read | `scripts/tg plan` for a stack (resolved from cwd) | +| `tf validate ` | read | `scripts/tg validate` | +| `tf fmt ` | read | `terraform fmt -recursive` on the stack | +| `tf force-unlock ` | write | release a stuck state lock | +| `tf apply ` | write | `scripts/tg apply` — auto-claims `stack:`, always releases, warns it's out-of-band | +| `work start ` | write | create `.worktrees/` on `/` off `/master`; enter with native `EnterWorktree` | +| `work land [--verify-cmd "…"] [--no-verify]` | write | merge master in → verify → push `HEAD:master` (non-ff retry; PR fallback) | +| `work clean ` | write | remove a task's worktree + branch (run from the main checkout) | + +`tf` resolves the stack dir by walking up from cwd to the infra root and +delegates to `scripts/tg` (which owns state decrypt/encrypt, the Vault lock, and +the ingress auth-comment check). git-crypt filter flags are auto-injected on git +operations in the encrypted infra repo. + +**`work land` refuses to push when it cannot verify** (no `--verify-cmd` and no +auto-detected suite) unless you pass `--no-verify` — landing to master unverified +must be deliberate. It does not yet block on CI to green (that arrives with the +ci/deploy watch verbs); it reminds you to follow the pipeline. + +Tiers are recorded per verb so a future PreToolUse classifier can auto-allow +reads / prompt writes; v0.1 allows everything and relies on existing gates +(permission mode, presence claims, plan approval). + +## Build / install + +Built from source to `/usr/local/bin/homelab` during devvm provisioning +(`scripts/workstation/setup-devvm.sh`, the `t3-dispatch` pattern); version is +stamped from `cli/VERSION` via ldflags. Manual build: + +``` +cd cli && go build -ldflags "-X main.version=$(cat VERSION)" -o /usr/local/bin/homelab . +go test ./... +``` + +## Legacy webhook use-cases (preserved) + +This binary is also the in-cluster `infra-cli` image. Invocations starting with +`-use-case=` fall through to the +original flag-based path unchanged, so the webhook handler is unaffected. + +## Design + +See `infra/docs/adr/0004`–`0006` for the architecture decisions. diff --git a/cli/VERSION b/cli/VERSION new file mode 100644 index 00000000..b82608c0 --- /dev/null +++ b/cli/VERSION @@ -0,0 +1 @@ +v0.1.0 diff --git a/docs/adr/0004-homelab-unified-cli.md b/docs/adr/0004-homelab-unified-cli.md new file mode 100644 index 00000000..27cce02a --- /dev/null +++ b/docs/adr/0004-homelab-unified-cli.md @@ -0,0 +1,30 @@ +# homelab: a unified infra-ops CLI grown in place from infra/cli + +Agents re-derive the same operational command boilerplate every session — mining +51,116 bash commands across 2,225 past sessions showed dense, repeated patterns +(the infra inner-loop alone is ~29%). We are building `homelab`, one CLI encoding +the deterministic, repeated **actions** (not judgment) agents run — composable in +bash, JSON-capable, and discovered progressively via `homelab manifest`. It is +grown **in place** in `cli/` (the existing `infra-cli`), absorbing new verb-groups +alongside the preserved legacy webhook use-cases. Versioned with a `cli/VERSION` +file (the infra repo deploys continuously and does not cut semver tags). + +## Considered options + +- **Its own top-level repo** (the original plan) — rejected in favour of keeping + it where the Terraform/Terragrunt and `scripts/tg` it drives already live; the + Go source isn't git-crypt-encrypted and a provision-time build is unaffected by + GitOps continuous-deploy. +- **A fresh CLI ignoring infra-cli** — rejected: strands the VPN/DNS/email + webhook use-cases. +- **Raw kubectl/tg/ssh + skills + MCP only** — kept for everything outside the + recurring action surface (methodology skills; third-party/owned MCP such as + phpIPAM, which homelab does NOT duplicate). + +## Consequences + +- The binary is dual-purpose: the agent-facing `homelab` verb surface AND the + in-cluster `infra-cli` webhook image. `main()` front-dispatches homelab verbs + and falls through to the legacy `-use-case` path verbatim. +- Distribution: built from source to `/usr/local/bin/homelab` during devvm + provisioning (`t3-dispatch` precedent), refreshed by `t3-autoupdate`. diff --git a/docs/adr/0005-homelab-v01-scope.md b/docs/adr/0005-homelab-v01-scope.md new file mode 100644 index 00000000..c1da7a95 --- /dev/null +++ b/docs/adr/0005-homelab-v01-scope.md @@ -0,0 +1,23 @@ +# homelab v0.1 scope: the infra inner-loop; everything allowed, tiers recorded + +v0.1 ships only the highest-volume surface — the infra inner-loop: `work` +(worktree lifecycle), `tf` (terragrunt via `scripts/tg` + fmt/validate/ +force-unlock), and `claim`/`release` (presence) — because it is ~29% of all mined +commands and where agents lose the most time and leak the most presence claims. + +v0.1 enforces **no** homelab-level permission gating: everything is allowed, +relying on existing gates (harness permission mode, presence claims, plan +approval). But every verb records a `read|write` tier (visible in `manifest`), so +a PreToolUse classifier hook (auto-allow reads / prompt writes) can be added +later with zero restructuring. + +## Considered options + +- **Reads-first vertical slice** (top read verb per domain) — lower risk, broad + value, but defers the toil that motivated the project. +- **One domain deep (k8s)** — cleanest template, narrow day-one value. + +We chose the highest-volume-but-write-heavy infra loop deliberately, accepting +the extra complexity (worktree lifecycle, git-crypt flag injection, presence +coupling, branch-protection PR fallback) for the biggest immediate toil +reduction. k8s/node/secret/net/ci verb-groups are deferred to later versions. diff --git a/docs/adr/0006-homelab-work-and-tf.md b/docs/adr/0006-homelab-work-and-tf.md new file mode 100644 index 00000000..fcdddc30 --- /dev/null +++ b/docs/adr/0006-homelab-work-and-tf.md @@ -0,0 +1,29 @@ +# homelab work/tf behaviour: native worktree entry, gated auto-land, presence-coupled apply + +Four behaviours of the infra-loop verbs are surprising enough to record: + +1. **`work` owns worktree create/land/clean, but session *entry* delegates to the + native harness worktree tool.** A CLI is a child process and cannot change the + agent's working directory; `EnterWorktree` can. So `homelab work start ` + creates the worktree + branch off `/master` (git-crypt-aware) and + prints the path — the agent enters it with native `EnterWorktree({path})`. + +2. **`work land` is auto-land, but gated on verification.** It merges master in → + runs verification → pushes `HEAD:master` (fetch+merge+retry on + non-fast-forward) → falls back to pushing the feature branch for a PR when the + direct push is rejected (branch protection). It **refuses to push when it + cannot verify** (no `--verify-cmd` and no auto-detected suite) unless + `--no-verify` is passed — added after an accidental smoke-test land pushed + unverified WIP to master (benign: the infra CI applied 0 stacks because the + diff was `cli/`-only, but an unverified land must be deliberate, not default). + +3. **`tf apply` is first-class despite GitOps, and mandatorily presence-coupled.** + Local applies are out-of-band (CI applies canonically on push) but happen + constantly (~763× in the corpus). `tf apply ` auto-claims `stack:`, + delegates to `scripts/tg apply --non-interactive`, and **always releases on + exit** (normal, error, or signal via `sync.Once` + handler) — fixing the + documented ~200-claim leak — and prints an out-of-band reminder. + +4. **Known v0.1 limitation:** `work land` does not yet block on CI to green; that + arrives with the ci/deploy watch verb-group. It prints a reminder to follow + the pipeline manually. diff --git a/scripts/workstation/setup-devvm.sh b/scripts/workstation/setup-devvm.sh index 37779867..1807bb80 100755 --- a/scripts/workstation/setup-devvm.sh +++ b/scripts/workstation/setup-devvm.sh @@ -175,6 +175,18 @@ if [[ ! -x /usr/local/bin/t3-dispatch ]]; then log "WARN: go absent -> cannot build t3-dispatch; install golang-go or deploy the binary" fi fi +# 9b2) homelab: unified infra-ops CLI (agent-facing verbs + the in-cluster +# infra-cli webhook image). Rebuilt from cli/ each run so it tracks the +# repo; version stamped from cli/VERSION. See cli/README.md + docs/adr/0004-0006. +if command -v go >/dev/null; then + _hl_src="$SCRIPTS/../cli" + _hl_ver="$(cat "$_hl_src/VERSION" 2>/dev/null || echo dev)" + log "building homelab CLI ($_hl_ver)" + ( cd "$_hl_src" && go build -ldflags "-X main.version=$_hl_ver" -o /usr/local/bin/homelab . ) \ + || log "WARN: homelab CLI build failed" +else + log "WARN: go absent -> cannot build homelab CLI" +fi # 9c) sudoers: t3-dispatch may run ONLY t3-mint as root. A malformed file in # /etc/sudoers.d breaks ALL sudo, so validate with visudo when available. if ! command -v visudo >/dev/null || visudo -cf "$SCRIPTS/sudoers-t3-autopair" >/dev/null; then