homelab: v0.5.0 — net/dns/metrics/logs probes (endpoint resolution)

The remaining verbs that pass the "saves reasoning, not just typing" test the
user posed mid-session: each encodes the non-obvious which-endpoint-reached-how
resolution otherwise re-derived every time. (Same test deprioritized node-ssh
and secret-get aliasing — thin wrappers over commands already known.)

- net check <host> [path]: two-legged reachability — external (public DNS→CF)
  vs internal (Traefik LB) — so you see WHERE a break is, not just that one path
  works. (live: surfaced the LB at 6ms vs CF 77ms.)
- dns lookup <name> [type]: Technitium (10.0.20.201) vs public (1.1.1.1) diff.
- metrics query "<promql>" / metrics alerts: Prometheus via the LB
  (prometheus-query.viktorbarzin.lan); alerts uses the synthetic ALERTS series
  since the query frontend has no /api/v1/alerts and Alertmanager has no ingress.
- logs query "<logql>" [--since 1h] [--limit N]: Loki range query via the LB.

All reach auth-free internal ingresses through the LB (Go form of
curl --resolve host:443:10.0.20.203) — no port-forward, no kubectl. In-cluster-
only endpoints (Alertmanager v2) deliberately out of scope. Verified live before
building; all five smoke-tested green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-19 11:27:31 +00:00
parent 9189560ac3
commit e91e1612dd
9 changed files with 466 additions and 3 deletions

View file

@ -112,6 +112,25 @@ remote, with retries that ride Woodpecker's intermittent empty responses.
step) is deferred to v0.4.1 — Woodpecker's per-pipeline detail/log endpoints were
the least reliable; `status`/`watch` use the list endpoint that works.
### v0.5 verbs — net / dns / metrics / logs
Reachability + observability probes. Their value is *endpoint resolution* — the
non-obvious "which host, public or LB, what auth, what URL shape" reasoning you'd
otherwise re-derive every time — not the HTTP call itself. All reach internal
ingresses through the Traefik LB (the Go form of `curl --resolve host:443:10.0.20.203`).
| Command | Tier | What it does |
|---|---|---|
| `net check <host> [path]` | read | probes the host two ways — external (public DNS → Cloudflare) vs internal (Traefik LB) — with status + latency, so you can tell *where* a break is (CF? app? the LB path?) |
| `dns lookup <name> [type]` | read | resolves via Technitium (`10.0.20.201`) and public (`1.1.1.1`), diffed — surfaces split-horizon vs propagation gaps |
| `metrics query "<promql>"` | read | Prometheus instant query (`prometheus-query.viktorbarzin.lan`); prints `value {labels}` or `--json` |
| `metrics alerts` | read | currently-firing alerts (via the synthetic `ALERTS` series — the query frontend has no `/api/v1/alerts`) |
| `logs query "<logql>" [--since 1h] [--limit N]` | read | Loki range query (`loki.viktorbarzin.lan`); prints log lines or `--json` |
Quote the PromQL/LogQL. These hit auth-free internal ingresses — no port-forward,
no kubectl. (In-cluster-only endpoints like Alertmanager stay out of scope; the
firing set is reachable via `ALERTS` instead.)
## Build / install
Built from source to `/usr/local/bin/homelab` during devvm provisioning
@ -131,4 +150,4 @@ original flag-based path unchanged, so the webhook handler is unaffected.
## Design
See `infra/docs/adr/0004``0009` for the architecture decisions.
See `infra/docs/adr/0004``0010` for the architecture decisions.