claude-breakglass: in-cluster warm break-glass UI for the devvm

Stand up the infra for Viktor's break-glass: when the devvm is wedged (cluster
healthy), open breakglass.viktorbarzin.me, have Claude SSH in to diagnose/fix,
and power-cycle VM 102 via the Proxmox host if needed. App half landed in the
claude-agent-service repo.

New stack stacks/claude-breakglass/ — own namespace + SA, NO Vault role (ESO
syncs only its key, so the pod has zero direct Vault access). Hardened to
survive the pressure it exists to fix: priorityClassName tier-0-core, broad
node-pressure tolerations, anti-affinity off node1, imagePullPolicy Always.
auth="required" ingress so it rides the Authentik resilience proxy and stays
reachable via the basic-auth fallback during an auth-stack outage. Runs the
shared claude-agent-service image with the breakglass entrypoint.
files/breakglass-pve is the PVE forced-command (status|forensics|reset|stop|
start|cycle on VM 102, forensics-first).

Isolation: the shared claude-agent pod's terraform-state Vault policy is
explicitly DENIED secret/claude-breakglass/* (stacks/vault/main.tf) so a
prompt-injected agent on that pod can't read the root-on-devvm key.

traefik: add a checksum/auth-proxy-htpasswd annotation so the auth-proxy rolls
when the emergency basic-auth password rotates (it's a subPath mount that
doesn't auto-update) — regenerated this session so Viktor has a known
emergency credential, which the auth-stack-outage failure domain requires.

Docs: docs/runbooks/breakglass-ui.md (full incident + bootstrap procedure,
incl. the per-host from= NAT quirks) and a security.md note recording the two
new privileged footholds.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-12 21:40:17 +00:00
parent 02785987dd
commit 32cf75635f
7 changed files with 629 additions and 0 deletions

View file

@ -598,6 +598,19 @@ resource "vault_policy" "terraform_state" {
path "secret/metadata/vault" {
capabilities = ["deny"]
}
# Explicit deny on the breakglass SSH key (added with the claude-breakglass
# stack, 2026-06-12). That key grants root-on-devvm + PVE VM-102 power
# verbs; it must NOT be readable by the shared claude-agent pod, whose
# agents (recruiter-triage, nextcloud-todos-exec) ingest untrusted input
# with Bash. The breakglass pod runs in its own namespace with NO Vault
# role and gets the key via ESO only. See
# claude-agent-service/docs/adr/0001-breakglass-security-architecture.md.
path "secret/data/claude-breakglass/*" {
capabilities = ["deny"]
}
path "secret/metadata/claude-breakglass/*" {
capabilities = ["deny"]
}
EOT
}