All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Viktor wanted a web UI on the claude service to act as his breakglass when the devvm is down: open it, have Claude SSH in to diagnose/repair, and power-cycle the VM via the Proxmox host if needed. This is the app half (the infra stack + host bootstrap live in the infra repo). New, ISOLATED ASGI app under app/breakglass/ (never imports app.main, so the untrusted-input agents — recruiter-triage, nextcloud-todos — can't share a process with the root-on-devvm / PVE-reset SSH key): - pve.py: the LLM-independent power-verb path (status|forensics|reset|stop| start|cycle on VM 102), whitelist-validated client-side, executed over the forced-command SSH key (list argv, no shell). - agent_session.py: multi-turn streamed chat — claude -p --session-id / --resume with --output-format stream-json, translated to a small SSE vocabulary (session/text/tool/result/error/done). - auth.py: edge Authentik header OR bearer; fail-closed. - server.py: FastAPI (session/chat-SSE/pve-verb routes) + serves the Svelte UI. - Svelte SPA (frontend/, built into app/breakglass/static/ and committed — no in-cluster build, per ADR-0002): streamed chat + danger-styled manual VM controls with confirm-on-mutate. - agents/breakglass.md: narrow tools (Bash/Read/Grep/Glob, no web), taught the ssh devvm / ssh pve aliases and cycle-vs-reset. - docker-entrypoint-breakglass.sh: ssh-agent bootstrap from the mounted key + ssh aliases, then uvicorn app.breakglass.server. The breakglass Deployment overrides the image CMD with this; the existing service is untouched. 26 new tests (verb whitelist incl. injection attempts, stream-json→SSE translation, auth gating, route behaviour); full suite 58 green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
36 lines
1.9 KiB
Python
36 lines
1.9 KiB
Python
"""Environment-driven config for the breakglass app.
|
|
|
|
Targets are hardcoded IPs by default (the breakglass must not depend on cluster
|
|
DNS — it has to work when things are broken). Everything is overridable via env
|
|
for tests and future re-IPing.
|
|
"""
|
|
import os
|
|
|
|
# SSH targets. IPs, not names — no DNS dependency in an incident.
|
|
DEVVM_HOST = os.environ.get("BREAKGLASS_DEVVM_HOST", "10.0.10.10")
|
|
DEVVM_USER = os.environ.get("BREAKGLASS_DEVVM_USER", "breakglass")
|
|
PVE_HOST = os.environ.get("BREAKGLASS_PVE_HOST", "192.168.1.127")
|
|
PVE_USER = os.environ.get("BREAKGLASS_PVE_USER", "root")
|
|
|
|
# The Claude agent the breakglass UI drives. Narrow tool surface, no web tools.
|
|
BREAKGLASS_AGENT = os.environ.get("BREAKGLASS_AGENT", "breakglass")
|
|
DEFAULT_MODEL = os.environ.get("BREAKGLASS_MODEL", "sonnet")
|
|
|
|
# Where claude session state + per-session scratch live. emptyDir in prod.
|
|
SESSIONS_DIR = os.environ.get("BREAKGLASS_SESSIONS_DIR", "/workspace/sessions")
|
|
|
|
# A single human operator per incident — no need for the job-runner's fan-out.
|
|
MAX_CONCURRENT_TURNS = int(os.environ.get("BREAKGLASS_MAX_CONCURRENT_TURNS", "2"))
|
|
# A chat turn that runs longer than this is killed (the agent is wedged).
|
|
TURN_TIMEOUT_SECONDS = int(os.environ.get("BREAKGLASS_TURN_TIMEOUT_SECONDS", "1800"))
|
|
# A single PVE power verb must return fast; a wedged host shouldn't hang the UI.
|
|
PVE_VERB_TIMEOUT_SECONDS = int(os.environ.get("BREAKGLASS_PVE_VERB_TIMEOUT_SECONDS", "120"))
|
|
|
|
# Auth. The app sits behind the ingress `auth = "required"` resilience proxy
|
|
# (Authentik SSO, basic-auth fallback when Authentik is down). We additionally
|
|
# accept a bearer token for machine/CLI callers. Either gate is sufficient;
|
|
# the edge is the primary one for the browser UI.
|
|
API_TOKEN = os.environ.get("API_BEARER_TOKEN", "")
|
|
# Header the auth-proxy injects for an authenticated human (set by Authentik, or
|
|
# by the basic-auth fallback's `$remote_user`). Presence ⇒ edge-authenticated.
|
|
TRUSTED_USER_HEADER = "x-authentik-username"
|