The remaining verbs that pass the "saves reasoning, not just typing" test the user posed mid-session: each encodes the non-obvious which-endpoint-reached-how resolution otherwise re-derived every time. (Same test deprioritized node-ssh and secret-get aliasing — thin wrappers over commands already known.) - net check <host> [path]: two-legged reachability — external (public DNS→CF) vs internal (Traefik LB) — so you see WHERE a break is, not just that one path works. (live: surfaced the LB at 6ms vs CF 77ms.) - dns lookup <name> [type]: Technitium (10.0.20.201) vs public (1.1.1.1) diff. - metrics query "<promql>" / metrics alerts: Prometheus via the LB (prometheus-query.viktorbarzin.lan); alerts uses the synthetic ALERTS series since the query frontend has no /api/v1/alerts and Alertmanager has no ingress. - logs query "<logql>" [--since 1h] [--limit N]: Loki range query via the LB. All reach auth-free internal ingresses through the LB (Go form of curl --resolve host:443:10.0.20.203) — no port-forward, no kubectl. In-cluster- only endpoints (Alertmanager v2) deliberately out of scope. Verified live before building; all five smoke-tested green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
153 lines
8.3 KiB
Markdown
153 lines
8.3 KiB
Markdown
# homelab
|
||
|
||
`homelab` is the unified, agent-facing CLI for operating this homelab — one
|
||
composable, JSON-capable surface for the operations agents run over and over,
|
||
discovered progressively at runtime. It is grown **in place** from this
|
||
directory (the former `infra-cli`), and the legacy webhook use-cases still work
|
||
(see below).
|
||
|
||
It encodes *actions*, never *judgment*: methodology (debugging, TDD, review) and
|
||
third-party/owned MCP servers (e.g. phpIPAM) are deliberately out of scope.
|
||
|
||
## Usage
|
||
|
||
```
|
||
homelab <command> [args]
|
||
homelab manifest [--json] # list every verb + its read/write tier (discovery entrypoint)
|
||
homelab version
|
||
```
|
||
|
||
### v0.1 verbs — the infra inner-loop
|
||
|
||
| Command | Tier | What it does |
|
||
|---|---|---|
|
||
| `claim <kind>:<name> --purpose "…"` | write | claim a shared resource on the presence board (wraps `scripts/presence`) |
|
||
| `release <kind>:<name>` | write | release a presence claim |
|
||
| `tf plan <stack>` | read | `scripts/tg plan` for a stack (resolved from cwd) |
|
||
| `tf validate <stack>` | read | `scripts/tg validate` |
|
||
| `tf fmt <stack>` | read | `terraform fmt -recursive` on the stack |
|
||
| `tf force-unlock <stack> <lock-id>` | write | release a stuck state lock |
|
||
| `tf apply <stack>` | write | `scripts/tg apply` — auto-claims `stack:<name>`, always releases, warns it's out-of-band |
|
||
| `work start <topic>` | write | create `.worktrees/<topic>` on `<user>/<topic>` off `<remote>/master`; enter with native `EnterWorktree` |
|
||
| `work land [--verify-cmd "…"] [--no-verify]` | write | merge master in → verify → push `HEAD:master` (non-ff retry; PR fallback) |
|
||
| `work clean <topic>` | write | remove a task's worktree + branch (run from the main checkout) |
|
||
|
||
### v0.2 verbs — Kubernetes
|
||
|
||
Built on an **app→namespace→pod resolver**: `<app>` defaults to the namespace
|
||
(most namespaces hold one app); the target defaults to `deploy/<app>` and lets
|
||
kubectl resolve the pod. Override with `-n`/`--pod`/`-c`/`-l`/`--tty`. Uses the
|
||
ambient kubeconfig.
|
||
|
||
| Command | Tier | What it does |
|
||
|---|---|---|
|
||
| `k8s status [ns]` | read | pods (wide) + recent non-Normal events (`-A` if no ns) |
|
||
| `k8s get <ns> <resource> […]` | read | `kubectl -n <ns> get …` passthrough |
|
||
| `k8s logs <app>` | read | logs for `deploy/<app>` (`--tail` default 200; `-c`/`--previous`/`--since`/`-l`) |
|
||
| `k8s describe <app> [resource]` | read | describe the deployment (or an explicit resource) |
|
||
| `k8s debug <app>` | read | one-shot triage: pods + workloads + describe + recent logs + events |
|
||
| `k8s pf <app> <local:remote> [target]` | read | port-forward to `svc/<app>` (or an explicit target) |
|
||
| `k8s rollout-status <app>` | read | `rollout status deploy/<app>` |
|
||
| `k8s db <app> [--mysql] [--db N] -- "<SQL>"` | write | exec into the dbaas DB (PG `pg-cluster-rw`, or MySQL with env-password wrapper) |
|
||
| `k8s exec <app> [--tty] -- <cmd>` | write | exec in the app's pod |
|
||
| `k8s restart <app>` | write | `rollout restart deploy/<app>` then wait for status |
|
||
| `k8s rm-pod <name> -n <ns> [--job] [--force]` | write | delete a stuck **pod/job only** |
|
||
|
||
Config-mutation verbs (`apply`/`edit`/`patch`/`scale`/`create`) are intentionally
|
||
**not** exposed — they stay raw `kubectl`, per the Terraform-only policy.
|
||
|
||
`tf` resolves the stack dir by walking up from cwd to the infra root and
|
||
delegates to `scripts/tg` (which owns state decrypt/encrypt, the Vault lock, and
|
||
the ingress auth-comment check). git-crypt filter flags are auto-injected on git
|
||
operations in the encrypted infra repo.
|
||
|
||
**`work land` refuses to push when it cannot verify** (no `--verify-cmd` and no
|
||
auto-detected suite) unless you pass `--no-verify` — landing to master unverified
|
||
must be deliberate. After pushing it **watches CI to green** (`ci watch` on the
|
||
landed commit) and fails if the pipeline does; pass `--no-ci-watch` to skip.
|
||
|
||
Tiers are recorded per verb so a future PreToolUse classifier can auto-allow
|
||
reads / prompt writes; v0.1 allows everything and relies on existing gates
|
||
(permission mode, presence claims, plan approval).
|
||
|
||
### v0.3 verbs — memory
|
||
|
||
A thin HTTP client over the **claude-memory** service (the same backend the
|
||
memory MCP wraps), authed with `CLAUDE_MEMORY_API_KEY` against
|
||
`CLAUDE_MEMORY_API_URL` (the env the hooks already set; defaults to the
|
||
ingress). Because it hits the HTTP API directly, it **works even when the MCP
|
||
frontend is down**.
|
||
|
||
| Command | Tier | What it does |
|
||
|---|---|---|
|
||
| `memory recall "<context>" [--query --category --sort --limit]` | read | semantic search (server-side ranking) — the navigate workhorse |
|
||
| `memory list [--category --tag --limit]` | read | recent memories |
|
||
| `memory categories` / `memory tags` / `memory stats` | read | enumerate the store |
|
||
| `memory secret <id>` | read | reveal a sensitive memory's content |
|
||
| `memory store "<content>" [--category --tags --keywords --importance --sensitive]` | write | store a memory |
|
||
| `memory update <id> [--content --tags --importance]` | write | edit a memory |
|
||
| `memory delete <id>` | write | delete a memory |
|
||
|
||
All read/write paths are validated against the live API (incl. a
|
||
store→recall→delete round-trip). This gives full data-plane parity with the MCP;
|
||
the eventual deprecation (rewiring the per-prompt auto-recall + auto-learn hooks
|
||
to the CLI, then uninstalling the MCP) is a **separate, deliberate follow-up** —
|
||
see `docs/adr/0008`.
|
||
|
||
### v0.4 verbs — ci / deploy
|
||
|
||
Watch what you trigger, without hand-rolling Woodpecker/kubectl polling. `ci`
|
||
talks to the Woodpecker API (token from `WOODPECKER_TOKEN` or Vault
|
||
`secret/ci/global`) via the internal Traefik LB, resolving the repo from the cwd
|
||
remote, with retries that ride Woodpecker's intermittent empty responses.
|
||
|
||
| Command | Tier | What it does |
|
||
|---|---|---|
|
||
| `ci status [commit]` | read | pipeline status for HEAD (or a commit) |
|
||
| `ci watch [commit]` | read | poll the pipeline to terminal; exit non-zero on failure |
|
||
| `deploy wait <ns>/<deploy> [--sha SHA]` | read | wait for the deployment image to match the sha, *then* rollout status (rollout status alone lies on the old ReplicaSet) |
|
||
|
||
`work land` now calls `ci watch` on the landed commit automatically (skip with
|
||
`--no-ci-watch`), closing the v0.1 "doesn't wait for CI" gap. `ci logs` (failing
|
||
step) is deferred to v0.4.1 — Woodpecker's per-pipeline detail/log endpoints were
|
||
the least reliable; `status`/`watch` use the list endpoint that works.
|
||
|
||
### v0.5 verbs — net / dns / metrics / logs
|
||
|
||
Reachability + observability probes. Their value is *endpoint resolution* — the
|
||
non-obvious "which host, public or LB, what auth, what URL shape" reasoning you'd
|
||
otherwise re-derive every time — not the HTTP call itself. All reach internal
|
||
ingresses through the Traefik LB (the Go form of `curl --resolve host:443:10.0.20.203`).
|
||
|
||
| Command | Tier | What it does |
|
||
|---|---|---|
|
||
| `net check <host> [path]` | read | probes the host two ways — external (public DNS → Cloudflare) vs internal (Traefik LB) — with status + latency, so you can tell *where* a break is (CF? app? the LB path?) |
|
||
| `dns lookup <name> [type]` | read | resolves via Technitium (`10.0.20.201`) and public (`1.1.1.1`), diffed — surfaces split-horizon vs propagation gaps |
|
||
| `metrics query "<promql>"` | read | Prometheus instant query (`prometheus-query.viktorbarzin.lan`); prints `value {labels}` or `--json` |
|
||
| `metrics alerts` | read | currently-firing alerts (via the synthetic `ALERTS` series — the query frontend has no `/api/v1/alerts`) |
|
||
| `logs query "<logql>" [--since 1h] [--limit N]` | read | Loki range query (`loki.viktorbarzin.lan`); prints log lines or `--json` |
|
||
|
||
Quote the PromQL/LogQL. These hit auth-free internal ingresses — no port-forward,
|
||
no kubectl. (In-cluster-only endpoints like Alertmanager stay out of scope; the
|
||
firing set is reachable via `ALERTS` instead.)
|
||
|
||
## Build / install
|
||
|
||
Built from source to `/usr/local/bin/homelab` during devvm provisioning
|
||
(`scripts/workstation/setup-devvm.sh`, the `t3-dispatch` pattern); version is
|
||
stamped from `cli/VERSION` via ldflags. Manual build:
|
||
|
||
```
|
||
cd cli && go build -ldflags "-X main.version=$(cat VERSION)" -o /usr/local/bin/homelab .
|
||
go test ./...
|
||
```
|
||
|
||
## Legacy webhook use-cases (preserved)
|
||
|
||
This binary is also the in-cluster `infra-cli` image. Invocations starting with
|
||
`-use-case=<vpn|setup-openwrt-dns|add-email-alias|...>` fall through to the
|
||
original flag-based path unchanged, so the webhook handler is unaffected.
|
||
|
||
## Design
|
||
|
||
See `infra/docs/adr/0004`–`0010` for the architecture decisions.
|