homelab: v0.5.0 — net/dns/metrics/logs probes (endpoint resolution)
The remaining verbs that pass the "saves reasoning, not just typing" test the user posed mid-session: each encodes the non-obvious which-endpoint-reached-how resolution otherwise re-derived every time. (Same test deprioritized node-ssh and secret-get aliasing — thin wrappers over commands already known.) - net check <host> [path]: two-legged reachability — external (public DNS→CF) vs internal (Traefik LB) — so you see WHERE a break is, not just that one path works. (live: surfaced the LB at 6ms vs CF 77ms.) - dns lookup <name> [type]: Technitium (10.0.20.201) vs public (1.1.1.1) diff. - metrics query "<promql>" / metrics alerts: Prometheus via the LB (prometheus-query.viktorbarzin.lan); alerts uses the synthetic ALERTS series since the query frontend has no /api/v1/alerts and Alertmanager has no ingress. - logs query "<logql>" [--since 1h] [--limit N]: Loki range query via the LB. All reach auth-free internal ingresses through the LB (Go form of curl --resolve host:443:10.0.20.203) — no port-forward, no kubectl. In-cluster- only endpoints (Alertmanager v2) deliberately out of scope. Verified live before building; all five smoke-tested green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
9189560ac3
commit
e91e1612dd
9 changed files with 466 additions and 3 deletions
|
|
@ -289,7 +289,7 @@ curl -X POST -H "Authorization: token $TOK" -H 'Content-Type: application/json'
|
|||
```
|
||||
|
||||
## Common Operations
|
||||
- **`homelab` CLI** (`/usr/local/bin/homelab`, source `cli/`): unified infra-ops verbs — run `homelab manifest` to discover the surface (each verb tagged read/write). Infra loop: `homelab tf plan|fmt|apply <stack>` (wraps `scripts/tg`; `apply` auto-claims presence + releases on exit, warns out-of-band), `homelab claim|release <kind>:<name>`, `homelab work start|land|clean <topic>` (worktree lifecycle; `land` gates on verification, `--verify-cmd`/`--no-verify`). Kubernetes (v0.2): `homelab k8s status|get|logs|describe|debug|pf|rollout-status <app>` (read; `<app>` defaults to the namespace, target to `deploy/<app>`), `homelab k8s db <app> [--mysql] -- "<SQL>"`, `k8s exec`, `k8s restart`, `k8s rm-pod` (pods/jobs only) — config-mutation kubectl verbs are intentionally absent (Terraform-only). Memory (v0.3): `homelab memory recall "<context>"` (semantic search), `memory list|categories|tags|stats|secret`, `memory store|update|delete` — a direct HTTP client to claude-memory that works even when the memory MCP is down. CI/deploy (v0.4): `homelab ci status|watch [commit]` (Woodpecker, repo resolved from cwd), `homelab deploy wait <ns>/<deploy> [--sha]` (image-sha + rollout) — `work land` now auto-watches CI to green. Full docs: `cli/README.md`.
|
||||
- **`homelab` CLI** (`/usr/local/bin/homelab`, source `cli/`): unified infra-ops verbs — run `homelab manifest` to discover the surface (each verb tagged read/write). Infra loop: `homelab tf plan|fmt|apply <stack>` (wraps `scripts/tg`; `apply` auto-claims presence + releases on exit, warns out-of-band), `homelab claim|release <kind>:<name>`, `homelab work start|land|clean <topic>` (worktree lifecycle; `land` gates on verification, `--verify-cmd`/`--no-verify`). Kubernetes (v0.2): `homelab k8s status|get|logs|describe|debug|pf|rollout-status <app>` (read; `<app>` defaults to the namespace, target to `deploy/<app>`), `homelab k8s db <app> [--mysql] -- "<SQL>"`, `k8s exec`, `k8s restart`, `k8s rm-pod` (pods/jobs only) — config-mutation kubectl verbs are intentionally absent (Terraform-only). Memory (v0.3): `homelab memory recall "<context>"` (semantic search), `memory list|categories|tags|stats|secret`, `memory store|update|delete` — a direct HTTP client to claude-memory that works even when the memory MCP is down. CI/deploy (v0.4): `homelab ci status|watch [commit]` (Woodpecker, repo resolved from cwd), `homelab deploy wait <ns>/<deploy> [--sha]` (image-sha + rollout) — `work land` now auto-watches CI to green. Net/obs (v0.5): `homelab net check <host> [path]` (external-CF vs internal-LB reachability), `dns lookup <name>` (Technitium vs public diff), `metrics query "<promql>"` / `metrics alerts` (Prometheus via LB), `logs query "<logql>" [--since]` (Loki via LB) — endpoint resolution baked in, no port-forward. Full docs: `cli/README.md`.
|
||||
- **Deploy new service**: Use `stacks/<existing-service>/` as template. Create stack, add DNS in tfvars, apply platform then service.
|
||||
- **Fix crashed pods**: Run healthcheck first. Safe to delete evicted/failed pods and CrashLoopBackOff pods with >10 restarts.
|
||||
- **OOMKilled**: Check `kubectl describe limitrange tier-defaults -n <ns>`. Increase `resources.limits.memory` in the stack's main.tf.
|
||||
|
|
|
|||
|
|
@ -112,6 +112,25 @@ remote, with retries that ride Woodpecker's intermittent empty responses.
|
|||
step) is deferred to v0.4.1 — Woodpecker's per-pipeline detail/log endpoints were
|
||||
the least reliable; `status`/`watch` use the list endpoint that works.
|
||||
|
||||
### v0.5 verbs — net / dns / metrics / logs
|
||||
|
||||
Reachability + observability probes. Their value is *endpoint resolution* — the
|
||||
non-obvious "which host, public or LB, what auth, what URL shape" reasoning you'd
|
||||
otherwise re-derive every time — not the HTTP call itself. All reach internal
|
||||
ingresses through the Traefik LB (the Go form of `curl --resolve host:443:10.0.20.203`).
|
||||
|
||||
| Command | Tier | What it does |
|
||||
|---|---|---|
|
||||
| `net check <host> [path]` | read | probes the host two ways — external (public DNS → Cloudflare) vs internal (Traefik LB) — with status + latency, so you can tell *where* a break is (CF? app? the LB path?) |
|
||||
| `dns lookup <name> [type]` | read | resolves via Technitium (`10.0.20.201`) and public (`1.1.1.1`), diffed — surfaces split-horizon vs propagation gaps |
|
||||
| `metrics query "<promql>"` | read | Prometheus instant query (`prometheus-query.viktorbarzin.lan`); prints `value {labels}` or `--json` |
|
||||
| `metrics alerts` | read | currently-firing alerts (via the synthetic `ALERTS` series — the query frontend has no `/api/v1/alerts`) |
|
||||
| `logs query "<logql>" [--since 1h] [--limit N]` | read | Loki range query (`loki.viktorbarzin.lan`); prints log lines or `--json` |
|
||||
|
||||
Quote the PromQL/LogQL. These hit auth-free internal ingresses — no port-forward,
|
||||
no kubectl. (In-cluster-only endpoints like Alertmanager stay out of scope; the
|
||||
firing set is reachable via `ALERTS` instead.)
|
||||
|
||||
## Build / install
|
||||
|
||||
Built from source to `/usr/local/bin/homelab` during devvm provisioning
|
||||
|
|
@ -131,4 +150,4 @@ original flag-based path unchanged, so the webhook handler is unaffected.
|
|||
|
||||
## Design
|
||||
|
||||
See `infra/docs/adr/0004`–`0009` for the architecture decisions.
|
||||
See `infra/docs/adr/0004`–`0010` for the architecture decisions.
|
||||
|
|
|
|||
|
|
@ -1 +1 @@
|
|||
v0.4.0
|
||||
v0.5.0
|
||||
|
|
|
|||
83
cli/cmd_net.go
Normal file
83
cli/cmd_net.go
Normal file
|
|
@ -0,0 +1,83 @@
|
|||
package main
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"strings"
|
||||
"time"
|
||||
)
|
||||
|
||||
func netCommands() []Command {
|
||||
return []Command{
|
||||
{Path: []string{"net", "check"}, Tier: TierRead,
|
||||
Summary: "reachability of <host>[/path]: external (public DNS→CF) vs internal (Traefik LB)", Run: netCheck},
|
||||
{Path: []string{"dns", "lookup"}, Tier: TierRead,
|
||||
Summary: "resolve <name> via Technitium (10.0.20.201) and public (1.1.1.1), diffed", Run: dnsLookup},
|
||||
}
|
||||
}
|
||||
|
||||
func fmtProbe(code int, d time.Duration, err error) string {
|
||||
if err != nil {
|
||||
return "ERR " + err.Error()
|
||||
}
|
||||
return fmt.Sprintf("HTTP %d %dms", code, d.Milliseconds())
|
||||
}
|
||||
|
||||
func netCheck(args []string) error {
|
||||
host, rest := firstPositional(args)
|
||||
if host == "" {
|
||||
return fmt.Errorf("usage: homelab net check <host> [path]")
|
||||
}
|
||||
path := "/"
|
||||
if len(rest) > 0 && !strings.HasPrefix(rest[0], "-") {
|
||||
path = rest[0]
|
||||
if !strings.HasPrefix(path, "/") {
|
||||
path = "/" + path
|
||||
}
|
||||
}
|
||||
u := "https://" + host + path
|
||||
fmt.Printf("%s\n", u)
|
||||
|
||||
// external leg: resolve via public DNS, dial the public IP (tests the real CF path)
|
||||
pubOut, _ := dig(hostOnly(host), "1.1.1.1", "")
|
||||
if pubIP := firstLine(pubOut); pubIP != "" {
|
||||
c, d, e := probeURL(clientDialingIP(pubIP, 10*time.Second), u)
|
||||
fmt.Printf(" external (public %-15s) %s\n", pubIP, fmtProbe(c, d, e))
|
||||
} else {
|
||||
fmt.Println(" external (public) no public A record")
|
||||
}
|
||||
// internal leg: dial the Traefik LB directly
|
||||
c, d, e := probeURL(clientDialingIP(internalLBIP, 10*time.Second), u)
|
||||
fmt.Printf(" internal (LB %-15s) %s\n", internalLBIP, fmtProbe(c, d, e))
|
||||
return nil
|
||||
}
|
||||
|
||||
func dnsLookup(args []string) error {
|
||||
name, rest := firstPositional(args)
|
||||
if name == "" {
|
||||
return fmt.Errorf("usage: homelab dns lookup <name> [A|AAAA|TXT|MX|PTR]")
|
||||
}
|
||||
rr := ""
|
||||
if len(rest) > 0 {
|
||||
rr = rest[0]
|
||||
}
|
||||
tech, _ := dig(name, "10.0.20.201", rr)
|
||||
pub, _ := dig(name, "1.1.1.1", rr)
|
||||
fmt.Printf("technitium (10.0.20.201): %s\n", oneLineList(tech))
|
||||
fmt.Printf("public (1.1.1.1) : %s\n", oneLineList(pub))
|
||||
if strings.TrimSpace(tech) != strings.TrimSpace(pub) {
|
||||
fmt.Println("⚠ mismatch — split-horizon (expected for internal-only apps) or a propagation gap")
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func hostOnly(h string) string { // strip any path accidentally included
|
||||
return strings.SplitN(h, "/", 2)[0]
|
||||
}
|
||||
|
||||
func oneLineList(s string) string {
|
||||
s = strings.TrimSpace(s)
|
||||
if s == "" {
|
||||
return "(none)"
|
||||
}
|
||||
return strings.ReplaceAll(s, "\n", ", ")
|
||||
}
|
||||
197
cli/cmd_obs.go
Normal file
197
cli/cmd_obs.go
Normal file
|
|
@ -0,0 +1,197 @@
|
|||
package main
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"net/url"
|
||||
"sort"
|
||||
"strconv"
|
||||
"strings"
|
||||
"time"
|
||||
)
|
||||
|
||||
const (
|
||||
promHost = "prometheus-query.viktorbarzin.lan"
|
||||
lokiHost = "loki.viktorbarzin.lan"
|
||||
)
|
||||
|
||||
func obsCommands() []Command {
|
||||
return []Command{
|
||||
{Path: []string{"metrics", "query"}, Tier: TierRead,
|
||||
Summary: `Prometheus instant query: metrics query "<promql>" [--json]`, Run: metricsQuery},
|
||||
{Path: []string{"metrics", "alerts"}, Tier: TierRead,
|
||||
Summary: "list currently firing Prometheus alerts", Run: metricsAlerts},
|
||||
{Path: []string{"logs", "query"}, Tier: TierRead,
|
||||
Summary: `Loki query (last --since, default 1h): logs query "<logql>" [--since 1h] [--limit N] [--json]`, Run: logsQuery},
|
||||
}
|
||||
}
|
||||
|
||||
// queryArg joins non-flag args into the query (PromQL/LogQL should normally be
|
||||
// passed as a single quoted argument; this also tolerates unquoted multi-token).
|
||||
func queryArg(args []string, valueFlags map[string]bool) string {
|
||||
var parts []string
|
||||
for i := 0; i < len(args); i++ {
|
||||
a := args[i]
|
||||
if valueFlags[a] {
|
||||
i++
|
||||
continue
|
||||
}
|
||||
if strings.HasPrefix(a, "-") {
|
||||
continue
|
||||
}
|
||||
parts = append(parts, a)
|
||||
}
|
||||
return strings.Join(parts, " ")
|
||||
}
|
||||
|
||||
func labelStr(m map[string]string) string {
|
||||
name := m["__name__"]
|
||||
var kv []string
|
||||
for k, v := range m {
|
||||
if k != "__name__" {
|
||||
kv = append(kv, k+"="+v)
|
||||
}
|
||||
}
|
||||
sort.Strings(kv)
|
||||
return name + "{" + strings.Join(kv, ",") + "}"
|
||||
}
|
||||
|
||||
func metricsQuery(args []string) error {
|
||||
q := queryArg(args, nil)
|
||||
if q == "" {
|
||||
return fmt.Errorf(`usage: homelab metrics query "<promql>" [--json]`)
|
||||
}
|
||||
v := url.Values{}
|
||||
v.Set("query", q)
|
||||
body, err := lbGetBody(promHost, "/api/v1/query", v)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if containsArg(args, "--json") {
|
||||
fmt.Println(string(body))
|
||||
return nil
|
||||
}
|
||||
var r struct {
|
||||
Data struct {
|
||||
Result []struct {
|
||||
Metric map[string]string `json:"metric"`
|
||||
Value []interface{} `json:"value"`
|
||||
} `json:"result"`
|
||||
} `json:"data"`
|
||||
}
|
||||
if err := json.Unmarshal(body, &r); err != nil {
|
||||
fmt.Println(string(body))
|
||||
return nil
|
||||
}
|
||||
if len(r.Data.Result) == 0 {
|
||||
fmt.Println("(no series)")
|
||||
return nil
|
||||
}
|
||||
for _, s := range r.Data.Result {
|
||||
val := ""
|
||||
if len(s.Value) == 2 {
|
||||
val = fmt.Sprint(s.Value[1])
|
||||
}
|
||||
fmt.Printf("%-14s %s\n", val, labelStr(s.Metric))
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func metricsAlerts(args []string) error {
|
||||
// prometheus-query is a query-only frontend (no /api/v1/alerts); the firing
|
||||
// set is exposed as the synthetic ALERTS series, queryable the normal way.
|
||||
v := url.Values{}
|
||||
v.Set("query", `ALERTS{alertstate="firing"}`)
|
||||
body, err := lbGetBody(promHost, "/api/v1/query", v)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if containsArg(args, "--json") {
|
||||
fmt.Println(string(body))
|
||||
return nil
|
||||
}
|
||||
var r struct {
|
||||
Data struct {
|
||||
Result []struct {
|
||||
Metric map[string]string `json:"metric"`
|
||||
} `json:"result"`
|
||||
} `json:"data"`
|
||||
}
|
||||
if err := json.Unmarshal(body, &r); err != nil {
|
||||
fmt.Println(string(body))
|
||||
return nil
|
||||
}
|
||||
if len(r.Data.Result) == 0 {
|
||||
fmt.Println("(no firing alerts)")
|
||||
return nil
|
||||
}
|
||||
for _, a := range r.Data.Result {
|
||||
m := a.Metric
|
||||
scope := ""
|
||||
for _, k := range []string{"namespace", "deployment", "instance", "job", "node"} {
|
||||
if v := m[k]; v != "" {
|
||||
scope = k + "=" + v
|
||||
break
|
||||
}
|
||||
}
|
||||
fmt.Printf("%-9s %-34s %s\n", m["severity"], m["alertname"], scope)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func logsQuery(args []string) error {
|
||||
q := queryArg(args, map[string]bool{"--since": true, "--limit": true})
|
||||
if q == "" {
|
||||
return fmt.Errorf(`usage: homelab logs query "<logql>" [--since 1h] [--limit N] [--json]`)
|
||||
}
|
||||
since := flagValue(args, "--since")
|
||||
if since == "" {
|
||||
since = "1h"
|
||||
}
|
||||
dur, err := time.ParseDuration(since)
|
||||
if err != nil {
|
||||
return fmt.Errorf("bad --since %q: %w", since, err)
|
||||
}
|
||||
limit := flagValue(args, "--limit")
|
||||
if limit == "" {
|
||||
limit = "100"
|
||||
}
|
||||
end := time.Now()
|
||||
v := url.Values{}
|
||||
v.Set("query", q)
|
||||
v.Set("limit", limit)
|
||||
v.Set("start", strconv.FormatInt(end.Add(-dur).UnixNano(), 10))
|
||||
v.Set("end", strconv.FormatInt(end.UnixNano(), 10))
|
||||
body, err := lbGetBody(lokiHost, "/loki/api/v1/query_range", v)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if containsArg(args, "--json") {
|
||||
fmt.Println(string(body))
|
||||
return nil
|
||||
}
|
||||
var r struct {
|
||||
Data struct {
|
||||
Result []struct {
|
||||
Values [][]string `json:"values"`
|
||||
} `json:"result"`
|
||||
} `json:"data"`
|
||||
}
|
||||
if err := json.Unmarshal(body, &r); err != nil {
|
||||
fmt.Println(string(body))
|
||||
return nil
|
||||
}
|
||||
n := 0
|
||||
for _, s := range r.Data.Result {
|
||||
for _, val := range s.Values {
|
||||
if len(val) == 2 {
|
||||
fmt.Println(val[1])
|
||||
n++
|
||||
}
|
||||
}
|
||||
}
|
||||
if n == 0 {
|
||||
fmt.Println("(no log lines)")
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
|
@ -18,6 +18,8 @@ func buildRegistry() []Command {
|
|||
reg = append(reg, memoryCommands()...)
|
||||
reg = append(reg, ciCommands()...)
|
||||
reg = append(reg, deployCommands()...)
|
||||
reg = append(reg, netCommands()...)
|
||||
reg = append(reg, obsCommands()...)
|
||||
return reg
|
||||
}
|
||||
|
||||
|
|
|
|||
76
cli/probe.go
Normal file
76
cli/probe.go
Normal file
|
|
@ -0,0 +1,76 @@
|
|||
package main
|
||||
|
||||
import (
|
||||
"context"
|
||||
"crypto/tls"
|
||||
"fmt"
|
||||
"io"
|
||||
"net"
|
||||
"net/http"
|
||||
"net/url"
|
||||
"os/exec"
|
||||
"strings"
|
||||
"time"
|
||||
)
|
||||
|
||||
// internalLBIP is the dedicated Traefik LB; every internal ingress routes through it.
|
||||
const internalLBIP = "10.0.20.203"
|
||||
|
||||
// clientDialingIP returns an http.Client that dials ip for ANY host while keeping
|
||||
// the URL host as SNI (so the cert matches) — the Go form of `curl --resolve
|
||||
// host:443:ip`. TLS verification is skipped (these are reachability/observability
|
||||
// probes, not security checks; internal .lan vhosts may serve a non-matching cert).
|
||||
func clientDialingIP(ip string, timeout time.Duration) *http.Client {
|
||||
d := &net.Dialer{Timeout: 8 * time.Second}
|
||||
tr := &http.Transport{
|
||||
DialContext: func(ctx context.Context, network, addr string) (net.Conn, error) {
|
||||
if i := strings.LastIndex(addr, ":"); i >= 0 {
|
||||
addr = ip + addr[i:]
|
||||
}
|
||||
return d.DialContext(ctx, network, addr)
|
||||
},
|
||||
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
|
||||
}
|
||||
return &http.Client{Timeout: timeout, Transport: tr}
|
||||
}
|
||||
|
||||
// probeURL issues a GET and returns status code + elapsed time.
|
||||
func probeURL(c *http.Client, rawurl string) (int, time.Duration, error) {
|
||||
start := time.Now()
|
||||
resp, err := c.Get(rawurl)
|
||||
dur := time.Since(start)
|
||||
if err != nil {
|
||||
return 0, dur, err
|
||||
}
|
||||
resp.Body.Close()
|
||||
return resp.StatusCode, dur, nil
|
||||
}
|
||||
|
||||
// lbGetBody GETs https://<host><path>?<q> through the internal LB and returns the body.
|
||||
func lbGetBody(host, path string, q url.Values) ([]byte, error) {
|
||||
u := "https://" + host + path
|
||||
if len(q) > 0 {
|
||||
u += "?" + q.Encode()
|
||||
}
|
||||
resp, err := clientDialingIP(internalLBIP, 20*time.Second).Get(u)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
body, _ := io.ReadAll(resp.Body)
|
||||
if resp.StatusCode >= 300 {
|
||||
return nil, fmt.Errorf("%s -> %d: %s", path, resp.StatusCode, strings.TrimSpace(string(body)))
|
||||
}
|
||||
return body, nil
|
||||
}
|
||||
|
||||
// dig runs `dig +short` against a resolver, optionally for a record type.
|
||||
func dig(name, server, rrtype string) (string, error) {
|
||||
args := []string{"+short", "+time=3", "+tries=1"}
|
||||
if rrtype != "" {
|
||||
args = append(args, rrtype)
|
||||
}
|
||||
args = append(args, name, "@"+server)
|
||||
out, err := exec.Command("dig", args...).Output()
|
||||
return strings.TrimSpace(string(out)), err
|
||||
}
|
||||
49
cli/probe_test.go
Normal file
49
cli/probe_test.go
Normal file
|
|
@ -0,0 +1,49 @@
|
|||
package main
|
||||
|
||||
import "testing"
|
||||
|
||||
func TestQueryArg(t *testing.T) {
|
||||
if got := queryArg([]string{"up"}, nil); got != "up" {
|
||||
t.Errorf(`queryArg(["up"]) = %q, want "up"`, got)
|
||||
}
|
||||
if got := queryArg([]string{"up", "--json"}, nil); got != "up" {
|
||||
t.Errorf(`--json should be dropped, got %q`, got)
|
||||
}
|
||||
// single quoted PromQL arrives as one token
|
||||
if got := queryArg([]string{"count by (node) (up)", "--json"}, nil); got != "count by (node) (up)" {
|
||||
t.Errorf(`quoted query mangled: %q`, got)
|
||||
}
|
||||
// value-flags and their values are skipped, query survives
|
||||
vf := map[string]bool{"--since": true, "--limit": true}
|
||||
if got := queryArg([]string{`{app="x"}`, "--since", "1h", "--limit", "50"}, vf); got != `{app="x"}` {
|
||||
t.Errorf(`value-flag skipping failed: %q`, got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestLabelStr(t *testing.T) {
|
||||
got := labelStr(map[string]string{"__name__": "up", "job": "x", "instance": "y"})
|
||||
if got != "up{instance=y,job=x}" { // __name__ extracted, rest sorted
|
||||
t.Errorf("labelStr = %q", got)
|
||||
}
|
||||
if got := labelStr(map[string]string{"alertname": "Foo"}); got != "{alertname=Foo}" {
|
||||
t.Errorf("labelStr (no __name__) = %q", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestOneLineList(t *testing.T) {
|
||||
if got := oneLineList(" "); got != "(none)" {
|
||||
t.Errorf("empty = %q, want (none)", got)
|
||||
}
|
||||
if got := oneLineList("a\nb"); got != "a, b" {
|
||||
t.Errorf("multi = %q, want 'a, b'", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestHostOnly(t *testing.T) {
|
||||
if got := hostOnly("foo.me/path"); got != "foo.me" {
|
||||
t.Errorf("hostOnly = %q", got)
|
||||
}
|
||||
if got := hostOnly("foo.me"); got != "foo.me" {
|
||||
t.Errorf("hostOnly = %q", got)
|
||||
}
|
||||
}
|
||||
37
docs/adr/0010-homelab-net-obs-verbs.md
Normal file
37
docs/adr/0010-homelab-net-obs-verbs.md
Normal file
|
|
@ -0,0 +1,37 @@
|
|||
# homelab net/dns/metrics/logs verbs: endpoint resolution as the unit of value
|
||||
|
||||
v0.5 adds `net`/`dns`/`metrics`/`logs`. These were chosen against an explicit
|
||||
test the user posed mid-build: *does the verb save reasoning, or only typing?* A
|
||||
wrapper over a command already known fluently (plain `ssh`, `vault kv get`) saves
|
||||
keystrokes but not thought. These four save thought — the reasoning they encode
|
||||
is **which endpoint, reached how, with what auth/URL shape** — re-derived every
|
||||
time otherwise. (That same test deprioritized `node ssh` aliasing and `secret
|
||||
get`, which are thin wrappers; see the session discussion.)
|
||||
|
||||
## Decisions
|
||||
|
||||
- **Internal ingresses, reached via the LB.** Everything routes through the
|
||||
Traefik LB by dialing `10.0.20.203` with the URL host preserved as SNI — the
|
||||
Go form of the house `curl --resolve host:443:10.0.20.203` pattern
|
||||
(`probe.go: clientDialingIP`). Verified live before building: Prometheus
|
||||
(`prometheus-query.viktorbarzin.lan`) and Loki (`loki.viktorbarzin.lan`) both
|
||||
answer JSON over the LB with **no auth gate and no port-forward** — so these
|
||||
stay clean HTTP clients, not kubectl wrappers.
|
||||
- **`net check` is two-legged on purpose.** It resolves the host via public DNS
|
||||
(→ Cloudflare) AND dials the internal LB, reporting both — because the useful
|
||||
question is *where* a break is (CF edge vs the app vs the LB path), which a
|
||||
single curl can't answer. The external leg forces public resolution (the devvm
|
||||
resolver is split-horizon and would otherwise hit the LB for both).
|
||||
- **`metrics alerts` uses the `ALERTS` series, not `/api/v1/alerts`.**
|
||||
`prometheus-query.*` is a query-only frontend (404 on `/api/v1/alerts`), and
|
||||
Alertmanager has no LB ingress (the alert-digest reads it in-cluster). Firing
|
||||
alerts are exposed as the synthetic `ALERTS{alertstate="firing"}` time series,
|
||||
queryable through the working endpoint — so no new dependency.
|
||||
- **Deliberately NOT built:** in-cluster-only endpoints (Alertmanager v2,
|
||||
raw `*.svc` services) that would force port-forward/`kubectl run`. The
|
||||
reasoning-savings there don't beat the added moving parts; kept out of scope.
|
||||
- **No `node`/`secret` group.** Same test: their high-volume parts are
|
||||
command-wrappers (low savings); only compound node ops (serial console, VM
|
||||
wait, fan-out) would qualify, and those are lower-frequency. Left unbuilt
|
||||
unless a concrete pain surfaces — the high-value deterministic surface
|
||||
(tf/work/ci/k8s/memory + these probes) is now covered.
|
||||
Loading…
Add table
Add a link
Reference in a new issue