break-glass SSH: drop port-knock for exposed key-only :52222; version host config
Viktor got locked out of the break-glass path (forgot the port-knock setup) and deleted the edge-router forwards, then asked to review and redesign it from scratch. Root cause of the lockout: the knock added no real security (key-only SSH is already brute-force-proof) and its only benefit — hiding the port — came at the cost of a circular dependency. The knock sequence lived only in in-cluster Vault, which is unreachable in the exact away/cold scenario break-glass exists for. So the unlock secret was unavailable precisely when needed. New model (self-contained, nothing to remember): plain key-only SSH on the Proxmox host's :52222, openly reachable. The edge router forwards WAN tcp/52222 -> 192.168.1.127:52222 (external port MUST equal internal on the TP-Link AX6000 - it rejects remaps; port 22 itself is reserved). The exposed port trusts only a dedicated break-glass key via `Match LocalPort` (a leak of any other root key does not grant internet access), rate-limited (iptables hashlimit) + fail2ban. - Removed knockd (package + config) and the legacy Synology SSH forward (ext 3333 -> .13:22, a needless WAN exposure the original plan wanted gone). - Fixed the fail2ban jail for Debian 13 (auth logs under sshd-session, not sshd - the stock journalmatch silently never banned). - Versioned the host config in scripts/ (it was applied ad-hoc, never committed) and recorded the deliberate Wave-1 "no public-IP" exception in security.md + .claude/CLAUDE.md. Superseded the 2026-05-30 port-knock design docs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
e2788d1b2d
commit
df332b59e6
9 changed files with 989 additions and 1 deletions
158
docs/runbooks/breakglass-ssh.md
Normal file
158
docs/runbooks/breakglass-ssh.md
Normal file
|
|
@ -0,0 +1,158 @@
|
|||
# Runbook: Break-glass SSH
|
||||
|
||||
Cold-survivable, brute-force-proof SSH onto the home LAN for when the Kubernetes
|
||||
cluster and its remote-access tunnels (Headscale, cloudflared) are down but the
|
||||
**Proxmox host + edge router are up**. Redesigned 2026-06-11 — the previous
|
||||
port-knock design is decommissioned (see "History" below).
|
||||
|
||||
## Model (as built)
|
||||
|
||||
```
|
||||
your laptop (anywhere) ── ssh -p 52222 ──▶ edge router 192.168.1.1
|
||||
│ WAN tcp/52222 ─▶ 192.168.1.127:52222
|
||||
▼
|
||||
Proxmox host 192.168.1.127
|
||||
sshd :52222 (key-only, break-glass key ONLY)
|
||||
→ full LAN via ssh -J / ssh -D
|
||||
```
|
||||
|
||||
- **No port-knock.** Plain `ssh -p 52222`. The SSH key is the only gate.
|
||||
- **Key-only**, brute-force-proof. The exposed `:52222` trusts **only** the
|
||||
dedicated break-glass key (`/root/.ssh/authorized_keys.breakglass`), separate
|
||||
from root's normal LAN-admin keys, so it is independently revocable and a leak
|
||||
of any other root key does not grant internet access.
|
||||
- **Rate-limited** per source IP (iptables hashlimit) + **fail2ban**. These trim
|
||||
scanner noise only; key-only auth is the real protection.
|
||||
- **Exposed, not hidden.** `:52222` answers on the WAN (Shodan-visible). This is
|
||||
a deliberate, documented exception to the Wave-1 "no public-IP access" policy
|
||||
(see `docs/architecture/security.md`), chosen for self-containment: it has **no
|
||||
dependency on the cluster** (unlike Headscale/cloudflared) and nothing to
|
||||
remember (unlike the old knock, whose sequence lived only in in-cluster Vault).
|
||||
|
||||
## Secrets (Vault `secret/viktor`)
|
||||
|
||||
| Key | Use |
|
||||
|---|---|
|
||||
| `breakglass_ssh_pubkey` | authorized on the host (`authorized_keys.breakglass`) |
|
||||
| `breakglass_ssh_privkey` | the private key (also on your laptop at `~/.ssh/breakglass_ed25519`) |
|
||||
|
||||
The key has **no passphrase** (so it works in a true cold event without anything
|
||||
to recall). Treat the private key as the sole credential — guard the laptop copy.
|
||||
|
||||
> Leftover: `breakglass_knock_sequence` is dead (knock decommissioned). It is
|
||||
> inert; remove it when you have a Vault token with the `patch` capability
|
||||
> (`vault kv patch` / merge-patch — the everyday token lacks it).
|
||||
|
||||
## Connect
|
||||
|
||||
Client `~/.ssh/config`:
|
||||
|
||||
```
|
||||
Host breakglass
|
||||
HostName viktorbarzin.ddns.net # follows the dynamic WAN IP
|
||||
Port 52222
|
||||
User root
|
||||
IdentityFile ~/.ssh/breakglass_ed25519
|
||||
IdentitiesOnly yes
|
||||
```
|
||||
|
||||
Then:
|
||||
|
||||
```bash
|
||||
ssh breakglass # shell on the Proxmox host
|
||||
ssh -J breakglass root@10.0.20.1 # jump to pfSense (or any LAN host)
|
||||
ssh -D 1080 breakglass # SOCKS5 → reach any internal IP
|
||||
```
|
||||
|
||||
There is **no `bg()` knock function** anymore — delete it from your shell rc if
|
||||
you added it under the old design.
|
||||
|
||||
## Cold-event IP cheat sheet (cluster DNS is down)
|
||||
|
||||
| Host | IP |
|
||||
|---|---|
|
||||
| Proxmox host | `192.168.1.127` |
|
||||
| pfSense | `10.0.20.1` (WAN `192.168.1.2`) |
|
||||
| k8s API | `10.0.20.100` |
|
||||
| Synology NAS | `192.168.1.13` (reach via `ssh -J breakglass`) |
|
||||
| edge router | `192.168.1.1` |
|
||||
|
||||
## Deploy / re-provision the host config
|
||||
|
||||
Source of truth lives in `infra/scripts/`. To (re)deploy:
|
||||
|
||||
```bash
|
||||
# 1. break-glass key authorized for the exposed port
|
||||
PUB="$(vault kv get -field=breakglass_ssh_pubkey secret/viktor)"
|
||||
ssh root@192.168.1.127 "printf '%s\n' '$PUB' > /root/.ssh/authorized_keys.breakglass && chmod 600 /root/.ssh/authorized_keys.breakglass"
|
||||
|
||||
# 2. sshd drop-in (dual-port, Match-isolated) — validate before reload (anti-lockout)
|
||||
scp scripts/sshd-10-breakglass.conf root@192.168.1.127:/etc/ssh/sshd_config.d/10-breakglass.conf
|
||||
ssh root@192.168.1.127 'sshd -t && systemctl reload ssh'
|
||||
|
||||
# 3. firewall (rate-limit) + boot unit
|
||||
scp scripts/breakglass-firewall.sh root@192.168.1.127:/usr/local/sbin/breakglass-firewall.sh
|
||||
ssh root@192.168.1.127 'chmod 0755 /usr/local/sbin/breakglass-firewall.sh && systemctl enable --now breakglass-firewall.service'
|
||||
|
||||
# 4. fail2ban jail
|
||||
scp scripts/fail2ban-breakglass-sshd.local root@192.168.1.127:/etc/fail2ban/jail.d/breakglass-sshd.local
|
||||
ssh root@192.168.1.127 'systemctl restart fail2ban && fail2ban-client status sshd'
|
||||
```
|
||||
|
||||
The `breakglass-firewall.service` unit (oneshot, `RemainAfterExit=yes`,
|
||||
`Before=network-online`-ish ordering) is a manual host unit — recreate it if the
|
||||
host is rebuilt:
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Break-glass base firewall (key-only SSH on :52222)
|
||||
After=network-pre.target
|
||||
Wants=network-pre.target
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=/usr/local/sbin/breakglass-firewall.sh
|
||||
RemainAfterExit=yes
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
## Edge-router forward (manual — live device, not Terraform)
|
||||
|
||||
TP-Link Archer AX6000 (`192.168.1.1`) → Advanced → NAT Forwarding → Port
|
||||
Forwarding. The break-glass rule:
|
||||
|
||||
| Service Name | Device IP | External Port | Internal Port | Protocol |
|
||||
|---|---|---|---|---|
|
||||
| `breakglass-ssh` | `192.168.1.127` | `52222` | `52222` | TCP |
|
||||
|
||||
**AX6000 quirks (learned 2026-06-11 — do not relearn the hard way):**
|
||||
- **External port must equal internal port.** The firmware rejects any remap
|
||||
(e.g. `22 → 52222`) with *"External Port: This item conflicts with existed
|
||||
ones."* Hence ext==int 52222.
|
||||
- **Port 22 is reserved** — even `22 → 22` is refused. Break-glass cannot use 22.
|
||||
- **Row delete is immediate** (no confirm dialog) — clicking the trash icon
|
||||
removes the rule and toasts "Operation succeeded".
|
||||
- Automation: `~/wizard/tools/insecure-browse/add-forward.{sh,js}` (dockerized
|
||||
Playwright; double-gated save `DRY_RUN=0 CONFIRM_SAVE=1`; supports
|
||||
`RULES_JSON` add, `EDIT_RULES_JSON` protocol-edit, `DELETE_RULES_JSON`
|
||||
identity-guarded delete). Router password: Vault
|
||||
`secret/viktor/edge_router_192_168_1_1_password`.
|
||||
|
||||
## Rotate / revoke
|
||||
|
||||
- **Revoke instantly:** remove the line from `/root/.ssh/authorized_keys.breakglass`.
|
||||
- **Rotate the key:** `ssh-keygen -t ed25519 -a 100 -f ~/.ssh/breakglass_ed25519`,
|
||||
`vault kv patch secret/viktor breakglass_ssh_privkey=@... breakglass_ssh_pubkey=...`,
|
||||
redeploy step 1 above.
|
||||
- **Router reset wipes forwards:** re-add the `breakglass-ssh` rule above.
|
||||
|
||||
## History
|
||||
|
||||
- **2026-05-30:** original design — key-only SSH on `:52222` gated behind a
|
||||
**UDP port-knock** (knockd). Decommissioned 2026-06-11: the knock added no real
|
||||
security (the SSH key already makes the port brute-force-proof) and its only
|
||||
benefit — hiding the port — came at the cost of a **circular dependency**: the
|
||||
knock sequence lived only in in-cluster Vault, unreachable in the exact
|
||||
cold/away scenario break-glass exists for. That caused a real lockout. The
|
||||
knockd package + config + the legacy Synology SSH forward (ext 3333 → .13:22)
|
||||
were removed.
|
||||
Loading…
Add table
Add a link
Reference in a new issue