infra

Author	SHA1	Message	Date
Viktor Barzin	e03e4719ad	vault: distinguish Vaultwarden vs HashiCorp Vault, add `vault kv` `homelab vault` only spoke to Vaultwarden (the password manager), but the name reads as HashiCorp Vault (the infra secrets store — actually OpenBao here). Make the two unmistakable and support both. Distinction (no breakage — the existing Vaultwarden verbs are unchanged): - bare `homelab vault` help now LEADS with the two-stores split; - every verb summary is tagged `[vaultwarden]` or `[hashicorp-vault]`; - HashiCorp Vault/OpenBao lives under a clearly-named `vault kv` group. New `vault kv` (HashiCorp Vault / OpenBao, the secret/… KV store): - `kv get <path> [--field K]` — read; --field → one value (TTY-aware clipboard/stdout), no field → full secret JSON (refuses a bare TTY). - `kv list <path>` — list sub-paths (no values). - `kv put <path> <key>` — write one key; value via stdin (piped or no-echo prompt, never argv); creates the path or merges (never clobbers siblings; uses kv patch -method=rw so no `patch` cap needed). Critical: `kv` uses the caller's OWN Vault token (OIDC ~/.vault-token / $VAULT_TOKEN), NOT the per-user scoped Vaultwarden token (bound only to claude-users/<user>, which would 403 elsewhere) — handlers set VAULT_ADDR but never inject the scoped token. Access is whatever the policy grants. Logic in cmd_vault_kv.go (pure cores extractKVData/parseKVList/arg builders/kvGet/List/Put; file header documents the credential split). CLI v0.11.0. Tests: no value in put argv, create-then-merge, KV-v2 envelope strip, help names both systems. Verified e2e against live Vault (read key-names-only + a scratch put/merge/cleanup). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-28 11:09:33 +00:00
Viktor Barzin	12a45fa94e	vault: bw sync on every read so reads show the latest values Some checks are pending Build infra CLI / build (push) Waiting to run Details ci/woodpecker/push/default Pipeline was successful Details `bw unlock` only decrypts the LOCAL cache, so a persisted (already logged-in) session served stale data — a password changed in the web vault wouldn't appear until the next fresh login. Add a best-effort `bw sync` in openSession (the chokepoint every read shares: get, get --all, list, code, status), so reads reflect current server-side values. Best-effort by design: a transient sync failure warns on stderr and falls back to the cached vault rather than failing the read (an AFK agent shouldn't break on a network blip). status keeps its own explicit sync so a reachability failure still surfaces in its report. CLI v0.10.1. Tests assert the sync runs after unlock and before the read, and that a read still succeeds when sync fails. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-28 10:19:54 +00:00
Viktor Barzin	ccee443790	vault: add `get --all` to browse every field of an item Some checks are pending Build infra CLI / build (push) Waiting to run Details ci/woodpecker/push/default Pipeline was successful Details `homelab vault get` could only fetch one of five allow-listed fields and had no way to see what fields an item even has — in particular it could not reach arbitrary user-defined custom fields. Add a `--all` flag that dumps the whole item as a normalized JSON object (`{name, username?, password?, uris?, totp?, notes?, fields?}`), so a Claude session can discover and read every field, custom ones included, in a single call. Security model preserved: - Like `get --json`, the dump is all secret values, so it refuses a bare TTY (pipe it, e.g. `\| jq`); the machine/agent path is stdout. - The TOTP seed is reduced to a presence flag (`"totp": true`) and never emitted — the seed is more powerful than a one-time code, so the only seed-derived path stays the specially-audited `vault code`. Tests assert the seed and password-history never appear in the dump. - Op-log uses a distinct `get-all` verb (item name still never logged) so a bulk dump is distinguishable from a single-field read. `normalizeItem` is a pure, unit-tested core; `getItem` is the session+fetch seam. CLI bumped to v0.10.0. Docs: README changelog, onboarding runbook, design spec §16. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-28 10:01:49 +00:00
Viktor Barzin	9a1ab6247b	cli: add `homelab edges` — who-talks-to-whom investigation helper (v0.9.0) Some checks are pending Build infra CLI / build (push) Waiting to run Details ci/woodpecker/push/default Pipeline was successful Details Makes the goldmane_edges east-west trail (ADR-0014) reachable during incident investigations without remembering the DB/creds/SQL. New top-level verb: homelab edges --ns <ns> edges touching <ns> (either direction) homelab edges --src/--dst <ns> directional egress / ingress peers homelab edges --peers-of <ns> distinct peer namespaces of <ns> homelab edges --new-since 24h first seen since a duration or date (YYYY-MM-DD) homelab edges --denied only action='deny' (blocked / lateral movement) homelab edges --json --limit N machine-readable / row cap (default 200) Filters render to a single read-only SELECT against the `edge` table, run via the dbaas CNPG primary pod (same exec path as `k8s db`). Namespace values are validated to the k8s name charset (injection guard) before they reach SQL. TDD: edges_test.go covers flag parsing, query building (each filter, AND combination, peers-of shape, JSON wrapper), the new-since duration/date parser, and namespace-validation / injection rejection. Smoke-tested live: --peers-of, --new-since 24h, --denied, and --json all return correct rows. Docs: runbook query section now leads with the CLI; cli/README gains a v0.9 section. VERSION v0.8.2 -> v0.9.0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-28 09:51:41 +00:00
Viktor Barzin	0fa5852ec6	homelab v0.8.2: fix `memory recall` truncating multibyte UTF-8 mid-character Some checks are pending Build infra CLI / build (push) Waiting to run Details ci/woodpecker/push/default Pipeline was successful Details emo's Claude Code sessions hit "UserPromptSubmit hook error" on almost every prompt. Root cause: the homelab-memory-recall.py UserPromptSubmit hook runs `homelab memory recall <prompt>` and strict-decodes its stdout. printMemories truncated each memory's preview with a BYTE slice (c[:240]), which cuts through the middle of a 2-byte Cyrillic character and emits invalid UTF-8 (a dangling 0xd0 lead byte). The hook's subprocess.run(text=True) then raised UnicodeDecodeError — not caught by its `except (TimeoutExpired, OSError)` — so the hook exited non-zero and Claude surfaced the error. It is Cyrillic-specific (ASCII has no multibyte chars to split), so it bit emo (Bulgarian prompts) every turn while English users almost never saw it. Two-layer fix: - cli: truncatePreview() now counts RUNES, not bytes, so the preview never splits a character. Regression test asserts valid UTF-8 on a long Cyrillic string. Fixes the root for every consumer of `memory recall` / `memory list`. - hook: subprocess.run gains errors="replace" and the except is broadened to honor the script's own "best-effort, exit 0" contract — so a truncated or otherwise odd payload can never again surface as a hook error. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-28 09:40:51 +00:00
Viktor Barzin	7dbbb74163	homelab v0.8.1: frame browser as escalation (default headless), match CLAUDE.md Some checks failed ci/woodpecker/push/default Pipeline was successful Details Build infra CLI / build (push) Has been cancelled Details Make `homelab browser --help` and chrome-service.md state the same tiered rule now in ~/code/CLAUDE.md: default to the Playwright MCP/headless browser for all routine automation; reach for `homelab browser` ONLY when headless is blocked (loads-but-submit-fails / one request errors while siblings 200 / explicit bot wall). Removes the "co-equal choice" framing so agents have one non-conflicting instruction. Adds a test asserting the tiered wording so it can't regress. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 15:44:43 +00:00
Viktor Barzin	a6b52a5839	homelab v0.8.0: browser verbs for headful anti-bot web automation Some checks are pending Build infra CLI / build (push) Waiting to run Details ci/woodpecker/push/default Pipeline was successful Details Add `homelab browser run\|open` so agents can drive the cluster's headful Chrome (chrome-service) over CDP from the devvm. The headless playwright/mcp browser can load anti-bot sites and fill their forms, but the gated submit silently fails — e.g. the Stirling Ackroyd Fixflo tenant portal returned net::ERR_FILE_NOT_FOUND on its pre-submit check and hung, creating nothing. Driving the real headful Chrome submits first try. That capability already existed but was undiscoverable, so it cost ~40 min + redundant form re-runs to find; now it is one command, versioned, test-covered, and `browser --help` carries the when-to-use signature + an error-code cheat-sheet so the right tool is reached at the right moment (the failure was judgment, not setup). - port-forward svc/chrome-service:9222 (tunnels API-server->pod, so it bypasses the :9222 NetworkPolicy), assert non-headless via /json/version, connect_over_cdp, inject the same vendored stealth.js the in-cluster callers use; the port-forward is always torn down, on success and on error. - node CDP client pinned to playwright-core@1.48.2 to match the v1.48.0-noble image (Chromium 130); self-provisioned lazily into ~/.cache/homelab, no per-user setup. - default is a fresh incognito context (safe for the shared browser + concurrent callers); --shared-context reuses the warmed persistent profile. - TDD: cmd_browser_test.go covers arg parsing, headless detection, the version pin, the help cheat-sheet, and a stealth.js drift guard. Verified end-to-end against bot.sannysoft.com (real Chrome UA, webdriver hidden, plugins/WebGL spoofed) and `browser open`. - docs: README v0.8 section, ADR-0013, and a chrome-service.md "driving from outside the cluster" section. Closes: code-nepg Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 12:22:22 +00:00
Viktor Barzin	b1bbe42821	homelab ha token: dedicated openclaw/ha-tokens secret + least-priv RBAC for emo Some checks are pending Build infra CLI / build (push) Waiting to run Details ci/woodpecker/push/default Pipeline was successful Details `ha token` originally read openclaw/openclaw-secrets -> skill_secrets, which only cluster admins can read — so it hung/failed for the non-admin operator it was built for (emo = emil.barzin@gmail.com, OIDC group "Home Server Admins", whose identity is deliberately barred from secrets in the openclaw namespace). Split the HA tokens into a dedicated secret openclaw/ha-tokens (keys sofia/london) with a Role + RoleBinding granting `get` on JUST that secret to the Home Server Admins group (k8s RBAC can't scope to a JSON sub-key, hence a separate object). emo now resolves the HA token with their own identity, WITHOUT gaining the rest of skill_secrets (slack_webhook, uptime_kuma_password). openclaw's own deployment keeps reading openclaw-secrets — purely additive. - stacks/openclaw/ha_tokens.tf: new secret + least-privilege Role/RoleBinding - cli/cmd_ha.go: read openclaw/ha-tokens (raw base64 per-instance key); drop JSON parse - README + ADR-0012 updated; VERSION -> v0.7.1 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-21 10:45:32 +00:00
Viktor Barzin	48225f2dea	homelab CLI v0.7: add `ha token` + `ha ssh` for Home Assistant Mined another devvm user's Claude sessions for repeated, hand-rolled command patterns worth absorbing into the shared CLI. The dominant signal was Home Assistant "Sofia" work: a `kubectl \| base64 \| jq` token-extraction pipeline re-derived ~420x, and a bespoke non-interactive `ssh -o …` invocation reinvented ~30x — every session. The existing `home-assistant-sofia.py` already covers the API but goes unused from an arbitrary cwd (needs an env var set + a cwd-relative path), so agents bypassed it and hand-rolled everything. Add two verbs covering exactly the gaps the `ha` MCP can't (entity state/control stays with the MCP): - `ha token [--instance sofia\|london]` (read): resolves the long-lived API token live from k8s secret openclaw/openclaw-secrets via the ambient kubeconfig — no pre-set env var. Composes as `curl -H "Authorization: Bearer $(homelab ha token)"`. - `ha ssh [--instance sofia\|london] -- <cmd>` (write): deterministic non-interactive ssh to the HA host using the invoking user's key. Also fix the root cause: `home-assistant-sofia.py` now falls back to `homelab ha token` when its env var is unset (works from any directory), and the home-assistant skill points agents at these verbs + `homelab metrics query` instead of hand-rolled curls. README + ADR-0012 + AGENTS.md updated per the per-verb-group convention. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-20 23:46:09 +00:00
Viktor Barzin	3e3fdb34f0	homelab: v0.6.0 — usage telemetry (usage top), evidence-driven verb prioritization Some checks are pending Build infra CLI / build (push) Waiting to run Details ci/woodpecker/push/default Pipeline was successful Details Answers the question that drove the whole CLI — which verbs to add next — with data instead of one maintainer's habits, and resolves the cross-user-usage ask in-bounds (no reading anyone's home). - emit on dispatch: every verb fire-and-forgets one Loki line {job,user,verb} + "exit=N ver=X". ONLY the verb path + exit code — never args, paths, flags, or secrets (the emit never sees arguments). Best-effort: 800ms timeout, errors swallowed, never affects the command; opt-out HOMELAB_TELEMETRY=0. Discovery verbs (manifest/version/help) and usage itself don't self-record. - usage top [--since 30d] [--user U] [--json]: ranks verbs via sum by (verb)(count_over_time({job="homelab-usage"}[…])) against the shared Loki. Cross-user analytics WITHOUT touching ~/.claude — the privacy-preserving answer to "what does the team use". - Loki sink (zero new infra, dogfoods v0.5 logs path); push verified HTTP 204 no auth. ADR docs/adr/0011. Live-verified: ran 4 verbs, usage top ranked them correctly (metrics query=2). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-19 22:29:01 +00:00
Viktor Barzin	e91e1612dd	homelab: v0.5.0 — net/dns/metrics/logs probes (endpoint resolution) The remaining verbs that pass the "saves reasoning, not just typing" test the user posed mid-session: each encodes the non-obvious which-endpoint-reached-how resolution otherwise re-derived every time. (Same test deprioritized node-ssh and secret-get aliasing — thin wrappers over commands already known.) - net check <host> [path]: two-legged reachability — external (public DNS→CF) vs internal (Traefik LB) — so you see WHERE a break is, not just that one path works. (live: surfaced the LB at 6ms vs CF 77ms.) - dns lookup <name> [type]: Technitium (10.0.20.201) vs public (1.1.1.1) diff. - metrics query "<promql>" / metrics alerts: Prometheus via the LB (prometheus-query.viktorbarzin.lan); alerts uses the synthetic ALERTS series since the query frontend has no /api/v1/alerts and Alertmanager has no ingress. - logs query "<logql>" [--since 1h] [--limit N]: Loki range query via the LB. All reach auth-free internal ingresses through the LB (Go form of curl --resolve host:443:10.0.20.203) — no port-forward, no kubectl. In-cluster- only endpoints (Alertmanager v2) deliberately out of scope. Verified live before building; all five smoke-tested green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-19 11:27:31 +00:00
Viktor Barzin	9189560ac3	homelab: v0.4.0 — ci/deploy verbs (watch what you trigger) Some checks are pending Build infra CLI / build (push) Waiting to run Details ci/woodpecker/push/default Pipeline was successful Details Adds the verb-group that kills the single biggest reasoning sink in agent sessions — watching a build/deploy to completion (proven the session that built it: hours hand-rolling Woodpecker polling + DB-schema spelunking for one CI incident). - ci status/watch: Woodpecker REST API (version-stable, not its DB schema), reached via the internal Traefik LB (dial 10.0.20.203, SNI=ci.viktorbarzin.me so the cert verifies — the Go form of the house `curl --resolve` pattern), token from WOODPECKER_TOKEN/Vault, repo id resolved from the cwd remote, with retries that ride Woodpecker's intermittent empty responses. watch matches the HEAD/given commit (avoids the post-push race) and exits non-zero on failure. - deploy wait: image-sha match THEN rollout status (rollout status alone returns success on the old ReplicaSet); kubectl-based. - work land now auto-watches CI to green on the landed commit (--no-ci-watch to skip), closing the v0.1 gap. - ci logs deferred to v0.4.1 (Woodpecker detail/log endpoints were the least reliable; status/watch use the working list endpoint). Live-verified ci status/watch against the live API. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-19 10:59:14 +00:00
Viktor Barzin	787ce4edfa	homelab: v0.3.1 — fix k8s db PG target (resolve CNPG primary pod, not the Service) Some checks are pending Build infra CLI / build (push) Waiting to run Details ci/woodpecker/push/default Pipeline was successful Details `k8s db <app>` (Postgres path) execed `pg-cluster-rw`, which is the CNPG read-write SERVICE, not a pod — so kubectl exec failed with `pods "pg-cluster-rw" not found`. The unit test only checked the plan; the verb was never fired at live state (the gap flagged in v0.2), so it shipped broken. Fix: the PG plan now carries a label selector (cnpg.io/instanceRole=primary) instead of a pod name, and k8s db resolves the actual primary POD via `kubectl get pod -l <selector>` before exec. MySQL path (real pod mysql-standalone-0) unchanged. Live-verified both paths (psql + mysql). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-19 09:09:34 +00:00
Viktor Barzin	48b63ffa6f	homelab: add memory verb-group (v0.3.0) — direct claude-memory HTTP client Some checks failed Build infra CLI / build (push) Waiting to run Details ci/woodpecker/push/default Pipeline failed Details Lets agents search/navigate memory via the CLI, as the first step toward deprecating the memory MCP. claude-memory is a FastAPI service (the MCP is just one frontend); homelab memory is a thin Bearer-auth HTTP client over the same API, using the env the hooks already set (CLAUDE_MEMORY_API_URL/KEY). It works even when the MCP frontend is down — the recurring disconnect that took the MCP offline for this whole session. Verbs: recall (server-side semantic search), list, categories, tags, stats, secret (read); store, update, delete (write). Validated against the live API including a store→recall→delete round-trip — full data-plane parity with the MCP. The deprecation itself (rewiring the per-prompt auto-recall + auto-learn hooks to the CLI, then uninstalling the MCP) is a deliberate follow-up, sequenced after the CLI is proven in the hooks — see docs/adr/0008. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-19 05:56:25 +00:00
Viktor Barzin	3594485f77	homelab: v0.2.0 — docs + version for the k8s verb-group Some checks are pending Build infra CLI / build (push) Waiting to run Details ci/woodpecker/push/default Pipeline was successful Details Bump cli/VERSION to v0.2.0; document the k8s verbs (README table + resolver note), add docs/adr/0007 (resolver, read/write split, config-mutation stays raw, db dbaas pattern), and extend the AGENTS.md discovery pointer with the Kubernetes surface. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-18 22:30:41 +00:00
Viktor Barzin	66caa0bf7f	homelab: v0.1 docs, distribution wiring, and version Some checks are pending Build infra CLI / build (push) Waiting to run Details ci/woodpecker/push/default Pipeline was successful Details Completes v0.1: documentation, build/install path, and version stamping. - cli/VERSION (v0.1.0) stamped into the binary via ldflags. - cli/README.md rewritten as the homelab overview (verbs + tiers, manifest, build, the preserved legacy webhook use-cases). - docs/adr/0004-0006: why homelab exists (grown in place from infra/cli, not a separate repo), v0.1 scope + everything-allowed/tiers-recorded, and the work/tf behaviour (native worktree entry, verification-gated auto-land, presence-coupled apply). - setup-devvm.sh builds cli/ -> /usr/local/bin/homelab each provisioning run (t3-dispatch pattern), so every devvm user gets the current binary. - AGENTS.md: discovery pointer under Common Operations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-18 19:25:51 +00:00

16 commits