break-glass SSH: drop port-knock for exposed key-only :52222; version host config
Viktor got locked out of the break-glass path (forgot the port-knock setup) and deleted the edge-router forwards, then asked to review and redesign it from scratch. Root cause of the lockout: the knock added no real security (key-only SSH is already brute-force-proof) and its only benefit — hiding the port — came at the cost of a circular dependency. The knock sequence lived only in in-cluster Vault, which is unreachable in the exact away/cold scenario break-glass exists for. So the unlock secret was unavailable precisely when needed. New model (self-contained, nothing to remember): plain key-only SSH on the Proxmox host's :52222, openly reachable. The edge router forwards WAN tcp/52222 -> 192.168.1.127:52222 (external port MUST equal internal on the TP-Link AX6000 - it rejects remaps; port 22 itself is reserved). The exposed port trusts only a dedicated break-glass key via `Match LocalPort` (a leak of any other root key does not grant internet access), rate-limited (iptables hashlimit) + fail2ban. - Removed knockd (package + config) and the legacy Synology SSH forward (ext 3333 -> .13:22, a needless WAN exposure the original plan wanted gone). - Fixed the fail2ban jail for Debian 13 (auth logs under sshd-session, not sshd - the stock journalmatch silently never banned). - Versioned the host config in scripts/ (it was applied ad-hoc, never committed) and recorded the deliberate Wave-1 "no public-IP" exception in security.md + .claude/CLAUDE.md. Superseded the 2026-05-30 port-knock design docs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
e2788d1b2d
commit
df332b59e6
9 changed files with 989 additions and 1 deletions
|
|
@ -255,6 +255,8 @@ Routed via **Loki ruler → Alertmanager → `#security` Slack receiver**. Same
|
|||
|
||||
**Policy: no public-IP access ever.** Vault, kube-apiserver, PVE sshd must transit a trusted LAN or Headscale. Anything else fires an alert.
|
||||
|
||||
**Documented exception — break-glass SSH (2026-06-11):** one deliberate carve-out. The Proxmox host's sshd listens on a WAN-exposed `:52222` (edge-router forward), **key-only**, trusting only a dedicated break-glass key (`Match LocalPort` → `authorized_keys.breakglass`), rate-limited (iptables hashlimit) + fail2ban. It is intentionally reachable from the public internet so it survives a cluster/tunnel outage with no dependency on the cluster — the one case the "must transit LAN/Headscale" rule cannot serve. Brute-force-proof (no password); the trade is Shodan-visibility. As-built: `docs/runbooks/breakglass-ssh.md`; rationale: `docs/plans/2026-06-11-breakglass-ssh-redesign-design.md`. (Replaced the 2026-05-30 port-knock variant, which was non-scannable but had a circular Vault dependency that caused a lockout.)
|
||||
|
||||
#### Why no canary tokens
|
||||
|
||||
Original plan included canary tokens (fake K8s Secret, Vault KV path, PVE file, sinkhole hostname). Rejected because Viktor routinely greps `secret/viktor` (135 keys) and lists `kubectl get secret -A` — any read-trigger canary self-fires. Use-based canaries (zero-RBAC SA tokens with audit alerts on use) were also considered but rejected in favor of cleaner source-IP anomaly detection (K9, V7) on REAL tokens — same threat model, no fake-token operational burden.
|
||||
|
|
|
|||
285
docs/plans/2026-05-30-breakglass-ssh-access-design.md
Normal file
285
docs/plans/2026-05-30-breakglass-ssh-access-design.md
Normal file
|
|
@ -0,0 +1,285 @@
|
|||
# Break-Glass SSH Access — Design
|
||||
|
||||
> **⚠️ SUPERSEDED 2026-06-11** by `2026-06-11-breakglass-ssh-redesign-design.md`.
|
||||
> The port-knock was removed: it added no real security (the SSH key already
|
||||
> makes the port brute-force-proof) and its knock sequence lived only in
|
||||
> in-cluster Vault — unreachable in the exact cold/away scenario break-glass
|
||||
> exists for, which caused a real lockout. Retained for history. As-built:
|
||||
> `docs/runbooks/breakglass-ssh.md`.
|
||||
|
||||
- **Date**: 2026-05-30
|
||||
- **Status**: Draft — pending user review
|
||||
- **Owner**: Viktor
|
||||
- **Related**: `docs/architecture/vpn.md`, `docs/architecture/security.md`, `infra/.claude/CLAUDE.md` (Security Posture Wave 1)
|
||||
|
||||
## 1. Goal
|
||||
|
||||
Provide a **cold, brute-force-proof backdoor onto the home LAN from the public
|
||||
internet** for the case where the Kubernetes cluster and every cluster-hosted
|
||||
remote-access path are down (cloudflared, Headscale/Tailscale, in-cluster
|
||||
WireGuard), but the **Proxmox host, pfSense, and the edge router are still up**.
|
||||
|
||||
### Hard requirements (from the user)
|
||||
|
||||
1. **Cold-survivable**: must work when the k8s cluster + all its tunnels are
|
||||
down. The path must touch **nothing in the cluster** (no Authentik, Traefik,
|
||||
Technitium/AdGuard DNS, cloudflared).
|
||||
2. **Full LAN access** once connected (SSH to Proxmox host, pfSense, Synology,
|
||||
k8s API, etc.).
|
||||
3. **No brute force**: no password-guessable surface.
|
||||
4. **Client uses only software pre-installed on Linux/macOS** — no WireGuard /
|
||||
Tailscale / fwknop client install. Stock `ssh` (+ `bash`) only.
|
||||
5. **Minimal effort**, and ideally **honor the locked Wave 1 policy**
|
||||
(`no public-IP access — … PVE sshd must transit LAN or Headscale`).
|
||||
|
||||
## 2. Decision
|
||||
|
||||
**Key-only SSH to the Proxmox host, gated behind a UDP port-knock.**
|
||||
|
||||
- The Proxmox host (`192.168.1.127`) is the entry point — it's the recovery box
|
||||
(`virsh`/`qm` to reboot the pfSense VM, `kubectl`, full hypervisor control)
|
||||
and it sits directly on the `192.168.1.0/24` segment, so the path **does not
|
||||
traverse pfSense or the cluster** — it survives a wedged pfSense too, not just
|
||||
a down cluster.
|
||||
- SSH is the only externally-usable remote tool **pre-installed on every
|
||||
Linux/macOS box**, satisfying requirement 4.
|
||||
- **Key-only auth** (no passwords anywhere) makes password brute force
|
||||
impossible → requirement 3.
|
||||
- A **port-knock** keeps the external SSH port **closed/invisible to scanners**
|
||||
until a knock sequence is sent. This restores the "no standing public service"
|
||||
property we'd have had with WireGuard and keeps us within the **intent** of the
|
||||
Wave 1 policy (PVE sshd is not internet-scannable). The knock is sent with a
|
||||
**bash `/dev/udp` one-liner** — zero install.
|
||||
|
||||
### Alternatives rejected
|
||||
|
||||
| Option | Why rejected |
|
||||
|---|---|
|
||||
| WireGuard road-warrior on pfSense | Needs a WireGuard **client app** (fails requirement 4). Was the prior design. |
|
||||
| Tailscale / Headscale | Client app + control plane is in-cluster (dies cold). |
|
||||
| Browser → web admin UI (Proxmox/pfSense/Synology) | "Pre-installed" (browser) but password-based → brute-forceable, far larger attack surface than a key-only SSH port. |
|
||||
| Plain **exposed** key-only SSH (no knock) | Brute-force-proof, but a **publicly visible** service (Shodan-catalogued) and a standing violation of the Wave 1 "no public PVE sshd" policy. The knock removes the standing exposure for ~15 min more setup. |
|
||||
| fwknop / cryptographic SPA | Strongest hiding, but needs a **client install** (fails requirement 4). |
|
||||
|
||||
## 3. Architecture
|
||||
|
||||
```
|
||||
Your laptop (anywhere) — stock ssh + bash, nothing installed
|
||||
│ (1) UDP knock sequence → bash: echo > /dev/udp/<pub>/<port> (instant, no handshake)
|
||||
│ (2) ssh -p 52222 root@<pub>
|
||||
▼
|
||||
Edge router 192.168.1.1 (the box the stored password unlocks)
|
||||
│ forwards: UDP <k1>,<k2>,<k3> + TCP 52222 → 192.168.1.127
|
||||
▼
|
||||
Proxmox host 192.168.1.127 ← path bypasses pfSense entirely
|
||||
├─ knockd (libpcap) sees the UDP knock → opens TCP 52222 for your source IP (30 s)
|
||||
├─ sshd listens on :22 (LAN admin, always) AND :52222 (external, knock-gated), key-only
|
||||
└─ once in: virsh/qm (reboot pfSense VM), kubectl, ssh -J / ssh -D → full LAN
|
||||
```
|
||||
|
||||
**Why it meets "cold + full LAN":** the host is up by definition of the chosen
|
||||
failure mode; nothing in the path depends on k8s, pfSense, or DNS. From the host
|
||||
you reach the whole LAN either directly (it's on `192.168.1.0/24` and routes to
|
||||
the VLANs via pfSense when pfSense is up) or by using SSH's built-in
|
||||
`-J`/`-D` — both stock, no install.
|
||||
|
||||
## 4. Components
|
||||
|
||||
### 4.1 Edge router @ 192.168.1.1 (manual, in the browser)
|
||||
Add port-forwards (same place the existing `51821` WireGuard forward lives):
|
||||
- **TCP 52222 → 192.168.1.127:52222** (external SSH; no port rewrite — see §4.3 rationale)
|
||||
- **UDP `<k1>`, `<k2>`, `<k3>` → 192.168.1.127** (knock ports; actual numbers in Vault)
|
||||
|
||||
If the router supports a **port range** forward, a single range covering the
|
||||
knock ports + 52222 is tidier than four rules.
|
||||
|
||||
> **Verify (#1 implementation check):** whether `.1` **preserves the source IP**
|
||||
> on forwarded packets (typical DNAT) or **SNATs** them to `192.168.1.1`. Test by
|
||||
> knocking + connecting from an external network and checking `/var/log/auth.log`
|
||||
> + `knockd` syslog for the observed source IP. The design works either way (see
|
||||
> §4.3), but it determines knock granularity.
|
||||
|
||||
### 4.2 SSH keys & Vault layout
|
||||
- Mint a **dedicated** break-glass keypair (ed25519), separate from
|
||||
`secret/viktor/proxmox_ssh_key`, so it's independently revocable and clearly
|
||||
labelled.
|
||||
- **Public key** → `/root/.ssh/authorized_keys` on the Proxmox host (no `from=`
|
||||
restriction — break-glass is from-anywhere; the knock + key are the gate).
|
||||
- **Private key** → Vault `secret/viktor/breakglass_ssh_privkey` (for
|
||||
re-provisioning) **and** on your laptop at `~/.ssh/breakglass_ed25519`
|
||||
(chmod 600).
|
||||
- **Knock sequence** → Vault `secret/viktor/breakglass_knock_sequence` (kept out
|
||||
of git — obscurity value only; see §5).
|
||||
|
||||
### 4.3 Proxmox host — sshd hardening
|
||||
`/etc/ssh/sshd_config.d/10-breakglass.conf`:
|
||||
```
|
||||
Port 22
|
||||
Port 52222
|
||||
PasswordAuthentication no
|
||||
KbdInteractiveAuthentication no
|
||||
PubkeyAuthentication yes
|
||||
PermitRootLogin prohibit-password # key-only root (PVE recovery norm)
|
||||
MaxAuthTries 3
|
||||
LoginGraceTime 20
|
||||
```
|
||||
- sshd listens on **:22 (LAN admin, always allowed)** and **:52222 (external,
|
||||
knock-gated)**. Using a dedicated external port (not a DNAT rewrite to 22)
|
||||
lets the firewall distinguish LAN vs external **regardless of `.1` SNAT
|
||||
behaviour** (§4.1) — LAN admin on `:22` is never affected by the gate.
|
||||
- **Default to root key-only** for recovery practicality. *Alternative for
|
||||
review:* a dedicated `breakglass` sudo user instead of root.
|
||||
|
||||
> **Verify (#2):** key login already works for your normal access **before**
|
||||
> `PasswordAuthentication no` is committed — no lockout. (Backup rsync jobs
|
||||
> already use keys, so this is likely already effectively true.)
|
||||
|
||||
### 4.4 Host firewall (knock gate)
|
||||
Default-drop the external SSH port; knockd punches a per-source hole. LAN admin
|
||||
(`:22`) and established sessions are untouched:
|
||||
```
|
||||
# allow established / related
|
||||
iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
|
||||
# LAN admin + backups: SSH on :22 always allowed
|
||||
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
|
||||
# external SSH on :52222 closed by default — knockd opens it per-source
|
||||
iptables -A INPUT -p tcp --dport 52222 -j DROP
|
||||
```
|
||||
- **knockd uses libpcap**, so it sees the UDP knock packets even though iptables
|
||||
drops them — the knock ports stay **silent/closed** to scanners.
|
||||
- **pve-firewall coexistence (verify #3):** confirm whether the PVE firewall is
|
||||
enabled. If it is, express these rules through it (or a dedicated chain) so a
|
||||
pve-firewall reload doesn't wipe the knockd-managed rule. Default PVE installs
|
||||
often have it off at datacenter level.
|
||||
|
||||
### 4.5 knockd
|
||||
`apt install knockd` (Debian/PVE). `/etc/knockd.conf`:
|
||||
```
|
||||
[options]
|
||||
UseSyslog
|
||||
Interface = vmbr0 # the 192.168.1.127 interface
|
||||
|
||||
[breakglass]
|
||||
sequence = <k1>:udp,<k2>:udp,<k3>:udp # real ports from Vault
|
||||
seq_timeout = 10
|
||||
start_command = /usr/sbin/iptables -I INPUT 1 -s %IP% -p tcp --dport 52222 -j ACCEPT
|
||||
cmd_timeout = 30
|
||||
stop_command = /usr/sbin/iptables -D INPUT -s %IP% -p tcp --dport 52222 -j ACCEPT
|
||||
```
|
||||
- **UDP knock** → the client knock is fire-and-forget (`/dev/udp`), no TCP-hang
|
||||
on the client (a TCP knock to a dropped port would block until timeout).
|
||||
- Opens `:52222` for the knocker's source IP for **30 s**; an SSH session
|
||||
established within that window **persists** via conntrack ESTABLISHED after the
|
||||
rule is removed. Enable + start the `knockd` service.
|
||||
|
||||
### 4.6 fail2ban (defense-in-depth)
|
||||
`apt install fail2ban`, sshd jail (watches `auth.log`, bans repeat failures).
|
||||
Local to the host, **no cluster dependency**. Catches anything that gets past the
|
||||
knock to the sshd listener.
|
||||
|
||||
### 4.7 Client side (laptop — stock tools only)
|
||||
`~/.ssh/config`:
|
||||
```
|
||||
Host breakglass
|
||||
HostName <public-ip-or-dyndns>
|
||||
Port 52222
|
||||
User root
|
||||
IdentityFile ~/.ssh/breakglass_ed25519
|
||||
```
|
||||
Knock + connect — a shell function using **bash builtins only** (works on
|
||||
macOS `/bin/bash` + Linux; UDP send is instant):
|
||||
```sh
|
||||
bg() {
|
||||
local host=<public-ip-or-dyndns>
|
||||
for p in <k1> <k2> <k3>; do echo -n x > "/dev/udp/$host/$p"; sleep 0.4; done
|
||||
sleep 0.5
|
||||
ssh breakglass "$@"
|
||||
}
|
||||
```
|
||||
- **Full LAN, no install:** `ssh -J breakglass <internal-host>` (jump), or
|
||||
`ssh -D 1080 breakglass` then point a browser/`curl` at SOCKS5 `127.0.0.1:1080`
|
||||
to reach any internal IP. From the host shell you already have everything.
|
||||
- *Optional fully-transparent variant:* fold the knock into a `ProxyCommand` in
|
||||
the `Host breakglass` block so plain `ssh breakglass` knocks automatically.
|
||||
|
||||
### 4.8 Cold-scenario IP cheat sheet (DNS is down when the cluster is down)
|
||||
Technitium + AdGuard are in-cluster, so `.lan` resolution is gone in a cold
|
||||
event. Use IPs:
|
||||
|
||||
| Host | IP |
|
||||
|---|---|
|
||||
| Proxmox host | `192.168.1.127` (also `10.0.10.1` VLAN10) |
|
||||
| pfSense | `10.0.20.1` (WAN `192.168.1.2`) |
|
||||
| k8s API server | `10.0.20.100` |
|
||||
| Synology NAS | `192.168.1.13` |
|
||||
| Edge router | `192.168.1.1` |
|
||||
| Traefik LB / MetalLB | `10.0.20.200` / `10.0.20.203` |
|
||||
|
||||
## 5. Security analysis
|
||||
|
||||
- **Brute force: solved.** No password auth anywhere → password guessing is
|
||||
impossible; key brute force is cryptographically infeasible.
|
||||
- **Invisibility / Wave 1 intent: satisfied.** The external SSH port is
|
||||
default-dropped and the knock ports are pcap-sniffed (never answered), so a
|
||||
scanner sees a closed/silent host — PVE sshd is **not internet-scannable**,
|
||||
honouring the spirit of "no public-IP access to PVE sshd".
|
||||
- **The knock is obscurity, not cryptography.** A port-knock sequence is
|
||||
plaintext and replayable by a passive on-path observer. **The SSH key is the
|
||||
real access control** — the knock only removes the standing/scannable surface.
|
||||
(Cryptographic SPA = fwknop, rejected for needing a client install.) Treat the
|
||||
knock sequence as a secret-ish convenience, not a second cryptographic factor.
|
||||
- **Residual risks** (none are brute force):
|
||||
1. An sshd **0-day** exploitable during the 30 s open window → mitigation: keep
|
||||
PVE patched; short `cmd_timeout`; fail2ban.
|
||||
2. **Private key theft** → mitigation: key has a passphrase; revoke by removing
|
||||
the line from `authorized_keys`.
|
||||
3. If `.1` **SNATs** (§4.1), the 30 s window opens `:52222` for the shared
|
||||
`192.168.1.1` source — anyone else arriving via `.1` in that window could
|
||||
reach the sshd banner, but still needs your key. Mitigated by the short
|
||||
window + key-only + fail2ban.
|
||||
- **Deliberate, documented exception** to the Wave 1 "no public-IP access"
|
||||
policy, scoped to this single knock-gated port. To be recorded in
|
||||
`security.md` + the Wave 1 note in `infra/.claude/CLAUDE.md` on implementation.
|
||||
|
||||
## 6. What's automated vs manual
|
||||
|
||||
- **I do**: generate the keypair + knock sequence, store them in Vault, produce
|
||||
the exact `sshd_config.d` snippet, `knockd.conf`, iptables rules, the client
|
||||
`~/.ssh/config` + `bg()` function, and write the runbook + doc updates.
|
||||
- **Manual / careful (live devices)**: the `.1` edge-router forwards are done by
|
||||
you in the browser (out-of-Terraform, live device). The Proxmox host changes
|
||||
(sshd, knockd, iptables, fail2ban) are applied over SSH **with key-login
|
||||
verified first** to avoid lockout; pfSense is **not** touched. None of this is
|
||||
a `tg apply` — pfSense and the edge router are not Terraform-managed.
|
||||
|
||||
## 7. Testing & verification
|
||||
1. From an **external** network (phone hotspot): run `bg`; confirm knockd syslog
|
||||
shows the sequence + opens `:52222`; SSH succeeds.
|
||||
2. **Without** knocking: `ssh -p 52222` from external → connection refused/timed
|
||||
out (port closed). A plain port scan of `52222` + the knock ports → silent.
|
||||
3. LAN admin on `:22` still works (no regression); backup rsync jobs unaffected.
|
||||
4. Full-LAN: `ssh -J breakglass 10.0.20.1` (pfSense) and `ssh -D 1080` SOCKS to
|
||||
an internal IP.
|
||||
5. Determine `.1` source-IP behaviour (verify #1) and adjust knock granularity
|
||||
note accordingly.
|
||||
|
||||
## 8. Failure modes & rotation
|
||||
- **Proxmox host down** (not just cluster): this path is gone — that's the
|
||||
out-of-band tier (serial/IPMI/separate device), explicitly **out of scope**.
|
||||
- **`.1` router config reset**: forwards lost → re-add from this doc; consider
|
||||
exporting the `.1` config for backup.
|
||||
- **Public IP change**: use a hostname endpoint (Cloudflare-resolved) so it
|
||||
auto-follows; keep the raw IP as fallback.
|
||||
- **Key/knock compromise**: remove the `authorized_keys` line (kills access
|
||||
instantly); rotate the knock sequence in `knockd.conf` + Vault.
|
||||
|
||||
## 9. Out of scope
|
||||
- Host-down / site-down out-of-band access (IPMI, LTE) — a future tier.
|
||||
- Phone access (would need an SSH **app**, e.g. Termius — outside the
|
||||
"pre-installed Linux/macOS" constraint; laptop is the target).
|
||||
|
||||
## 10. Docs to update on implementation
|
||||
- `docs/architecture/vpn.md` — add a "Break-glass SSH" section.
|
||||
- `docs/architecture/security.md` + Wave 1 note in `infra/.claude/CLAUDE.md` —
|
||||
record the deliberate knock-gated exception to "no public PVE sshd".
|
||||
- New runbook `docs/runbooks/breakglass-ssh.md` — connect + rotate procedure.
|
||||
395
docs/plans/2026-05-30-breakglass-ssh-access-plan.md
Normal file
395
docs/plans/2026-05-30-breakglass-ssh-access-plan.md
Normal file
|
|
@ -0,0 +1,395 @@
|
|||
# Break-Glass SSH Access — Implementation Plan
|
||||
|
||||
> **⚠️ SUPERSEDED 2026-06-11** by the redesign in
|
||||
> `2026-06-11-breakglass-ssh-redesign-design.md` (port-knock removed). Retained
|
||||
> for history. As-built: `docs/runbooks/breakglass-ssh.md`.
|
||||
|
||||
> **Execution model:** This plan mutates **live devices** (the Proxmox host's sshd, and the TP-Link edge router). It is **human-gated**, NOT for autonomous subagents. Each live step is applied with anti-lockout verification, and every edge-router change is made by Viktor (or by the browse tool with explicit per-change approval). Steps use `- [ ]` checkboxes.
|
||||
|
||||
**Goal:** Stand up a cold, brute-force-proof SSH backdoor onto the LAN — key-only SSH to the Proxmox host (`192.168.1.127`) gated behind a UDP port-knock — then decommission the legacy Synology SSH exposure and tighten UPnP.
|
||||
|
||||
**Architecture:** Edge router `.1` forwards a UDP knock sequence + TCP `52222` to the Proxmox host. The host runs `knockd` (libpcap) which opens `52222` for the knocker's IP for 30 s; `sshd` listens on `:22` (LAN, always) and `:52222` (external, knock-gated), key-only. Path bypasses pfSense + the k8s cluster. Client uses only stock `ssh` + `bash`.
|
||||
|
||||
**Tech stack:** OpenSSH, knockd, iptables, fail2ban (Debian/PVE host); TP-Link Archer AX6000 UI (edge router); HashiCorp Vault (secrets); Docker (`/home/wizard/tools/insecure-browse` for any router automation).
|
||||
|
||||
**Reference:** design doc `2026-05-30-breakglass-ssh-access-design.md`. Router audit (current `.1` forwards) recorded in task notes + `/home/wizard/tools/insecure-browse/out/`.
|
||||
|
||||
---
|
||||
|
||||
## Pre-flight (read before starting)
|
||||
|
||||
- **Anti-lockout rule:** never disable password auth or reload sshd without an *already-open* root session held + a *new* session verified. Applies to every host step.
|
||||
- **Live-router rule:** all `.1` changes are made by Viktor in the UI (or browse-tool with explicit approval). No blind automation of router writes.
|
||||
- **Ordering rule:** the legacy Synology SSH forward (Rule 6) is **not** closed until break-glass is verified working from an external network (Phase 4 gates on Phase 4-pre verification).
|
||||
- **Host access:** PVE host reached as `ssh root@192.168.1.127` from the LAN.
|
||||
- **Commit gate:** the infra repo currently has unmerged conflicts + an in-progress provider/backend migration. Do NOT commit (Phase 6) until Viktor confirms the repo is clean.
|
||||
|
||||
---
|
||||
|
||||
## Phase 0 — Generate secrets (no live changes)
|
||||
|
||||
### Task 0.1: Break-glass SSH keypair
|
||||
|
||||
**Files:** none in repo (secrets → Vault).
|
||||
|
||||
- [ ] **Step 1: Generate a dedicated ed25519 keypair (with passphrase)**
|
||||
|
||||
```bash
|
||||
mkdir -p ~/.ssh
|
||||
ssh-keygen -t ed25519 -a 100 -C "breakglass-$(date +%Y%m%d)" -f ~/.ssh/breakglass_ed25519
|
||||
# set a passphrase when prompted (so a stolen laptop key isn't instantly usable)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Store the private key + public key in Vault**
|
||||
|
||||
```bash
|
||||
vault kv patch secret/viktor \
|
||||
breakglass_ssh_privkey=@$HOME/.ssh/breakglass_ed25519 \
|
||||
breakglass_ssh_pubkey="$(cat ~/.ssh/breakglass_ed25519.pub)"
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Verify the keys are retrievable**
|
||||
|
||||
```bash
|
||||
vault kv get -field=breakglass_ssh_pubkey secret/viktor
|
||||
```
|
||||
Expected: prints the `ssh-ed25519 AAAA... breakglass-YYYYMMDD` line.
|
||||
|
||||
### Task 0.2: Knock sequence
|
||||
|
||||
- [ ] **Step 1: Generate 3 random UDP knock ports**
|
||||
|
||||
```bash
|
||||
KNOCK="$(shuf -i 20000-60000 -n 3 | paste -sd, -)"; echo "$KNOCK"
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Store the sequence in Vault (keep it out of git)**
|
||||
|
||||
```bash
|
||||
vault kv patch secret/viktor breakglass_knock_sequence="$KNOCK"
|
||||
vault kv get -field=breakglass_knock_sequence secret/viktor
|
||||
```
|
||||
Expected: prints three comma-separated ports, e.g. `28411,49027,33180`.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1 — Proxmox host: key-only SSH + knock gate (LIVE host change)
|
||||
|
||||
> Run everything in this phase **on the PVE host**. Keep your current `ssh root@192.168.1.127` session open the entire phase.
|
||||
|
||||
### Task 1.1: Pre-checks (no changes yet)
|
||||
|
||||
- [ ] **Step 1: Confirm key login already works (anti-lockout baseline)**
|
||||
|
||||
From your laptop, with the break-glass key authorized later — for now confirm your *existing* admin key works:
|
||||
```bash
|
||||
ssh -o PasswordAuthentication=no root@192.168.1.127 'echo KEY_LOGIN_OK'
|
||||
```
|
||||
Expected: `KEY_LOGIN_OK` (key auth works → safe to disable passwords later). If it prompts for a password, STOP and fix key auth first.
|
||||
|
||||
- [ ] **Step 2: Check whether the PVE firewall is active (coexistence)**
|
||||
|
||||
```bash
|
||||
ssh root@192.168.1.127 'pve-firewall status 2>/dev/null; iptables -S | head'
|
||||
```
|
||||
Expected: note whether `Status: enabled/running`. If **enabled**, add the Phase-1.4 rules via PVE's firewall (Datacenter→Firewall) instead of raw iptables, OR disable it if unused. If **disabled** (common), proceed with the raw-iptables approach below.
|
||||
|
||||
### Task 1.2: Authorize the break-glass key
|
||||
|
||||
- [ ] **Step 1: Append the break-glass public key to root's authorized_keys**
|
||||
|
||||
```bash
|
||||
PUB="$(vault kv get -field=breakglass_ssh_pubkey secret/viktor)"
|
||||
ssh root@192.168.1.127 "grep -qF '$PUB' /root/.ssh/authorized_keys || echo '$PUB' >> /root/.ssh/authorized_keys"
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Verify break-glass key logs in (on :22, still default)**
|
||||
|
||||
```bash
|
||||
ssh -i ~/.ssh/breakglass_ed25519 -o PasswordAuthentication=no root@192.168.1.127 'echo BREAKGLASS_KEY_OK'
|
||||
```
|
||||
Expected: `BREAKGLASS_KEY_OK`.
|
||||
|
||||
### Task 1.3: sshd dual-port + key-only
|
||||
|
||||
**Files:** Create on host: `/etc/ssh/sshd_config.d/10-breakglass.conf`
|
||||
|
||||
- [ ] **Step 1: Write the sshd drop-in**
|
||||
|
||||
```bash
|
||||
ssh root@192.168.1.127 'cat > /etc/ssh/sshd_config.d/10-breakglass.conf' <<'EOF'
|
||||
Port 22
|
||||
Port 52222
|
||||
PasswordAuthentication no
|
||||
KbdInteractiveAuthentication no
|
||||
PubkeyAuthentication yes
|
||||
PermitRootLogin prohibit-password
|
||||
MaxAuthTries 3
|
||||
LoginGraceTime 20
|
||||
EOF
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Validate config syntax (do NOT reload yet)**
|
||||
|
||||
```bash
|
||||
ssh root@192.168.1.127 'sshd -t && echo SSHD_CONFIG_OK'
|
||||
```
|
||||
Expected: `SSHD_CONFIG_OK`. If error, fix the drop-in before reloading.
|
||||
|
||||
- [ ] **Step 3: Reload sshd (current session stays alive)**
|
||||
|
||||
```bash
|
||||
ssh root@192.168.1.127 'systemctl reload ssh && echo RELOADED'
|
||||
```
|
||||
Expected: `RELOADED`.
|
||||
|
||||
- [ ] **Step 4: Verify a NEW key session works on :22 AND :52222 before trusting it**
|
||||
|
||||
```bash
|
||||
ssh -i ~/.ssh/breakglass_ed25519 -p 22 root@192.168.1.127 'echo OK22'
|
||||
ssh -i ~/.ssh/breakglass_ed25519 -p 52222 root@192.168.1.127 'echo OK52222'
|
||||
```
|
||||
Expected: `OK22` and `OK52222`. (If `:52222` refuses, sshd may not have bound the second port — check `ss -tlnp | grep ssh` on the host.) Only after both succeed, the old session is safe to drop.
|
||||
|
||||
### Task 1.4: Base firewall (default-drop :52222, allow :22 + established)
|
||||
|
||||
**Files:** Create on host: `/usr/local/sbin/breakglass-firewall.sh`, `/etc/systemd/system/breakglass-firewall.service`
|
||||
|
||||
- [ ] **Step 1: Write the idempotent base-firewall script (dedicated chain)**
|
||||
|
||||
```bash
|
||||
ssh root@192.168.1.127 'cat > /usr/local/sbin/breakglass-firewall.sh' <<'EOF'
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
# Idempotent: (re)build a dedicated BREAKGLASS chain hooked into INPUT.
|
||||
iptables -N BREAKGLASS 2>/dev/null || iptables -F BREAKGLASS
|
||||
iptables -C INPUT -j BREAKGLASS 2>/dev/null || iptables -I INPUT 1 -j BREAKGLASS
|
||||
# established/related always allowed
|
||||
iptables -A BREAKGLASS -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
|
||||
# LAN admin on :22 always allowed (.1 does NOT forward :22 to this host, so :22 is LAN-only)
|
||||
iptables -A BREAKGLASS -p tcp --dport 22 -j ACCEPT
|
||||
# external SSH on :52222 closed by default; knockd punches a per-source ACCEPT into INPUT pos 1
|
||||
iptables -A BREAKGLASS -p tcp --dport 52222 -j DROP
|
||||
EOF
|
||||
ssh root@192.168.1.127 'chmod 0755 /usr/local/sbin/breakglass-firewall.sh'
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Write a boot-time systemd unit (persists across reboot, before knockd)**
|
||||
|
||||
```bash
|
||||
ssh root@192.168.1.127 'cat > /etc/systemd/system/breakglass-firewall.service' <<'EOF'
|
||||
[Unit]
|
||||
Description=Break-glass base firewall (SSH knock gate)
|
||||
After=network-pre.target
|
||||
Before=knockd.service
|
||||
Wants=network-pre.target
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=/usr/local/sbin/breakglass-firewall.sh
|
||||
RemainAfterExit=yes
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
ssh root@192.168.1.127 'systemctl daemon-reload && systemctl enable --now breakglass-firewall.service && echo FW_APPLIED'
|
||||
```
|
||||
Expected: `FW_APPLIED`.
|
||||
|
||||
- [ ] **Step 3: Verify LAN :22 still works and :52222 is now dropped from LAN**
|
||||
|
||||
```bash
|
||||
ssh -i ~/.ssh/breakglass_ed25519 -p 22 root@192.168.1.127 'echo STILL_OK22' # works
|
||||
nc -z -w3 192.168.1.127 52222 && echo "OPEN(bad)" || echo "CLOSED_AS_EXPECTED" # closed pre-knock
|
||||
```
|
||||
Expected: `STILL_OK22` and `CLOSED_AS_EXPECTED`.
|
||||
|
||||
### Task 1.5: knockd
|
||||
|
||||
**Files:** Create/modify on host: `/etc/knockd.conf`, `/etc/default/knockd`
|
||||
|
||||
- [ ] **Step 1: Install knockd (host daemon — must be native, not Docker, to manage host iptables)**
|
||||
|
||||
```bash
|
||||
ssh root@192.168.1.127 'apt-get update -qq && apt-get install -y knockd && echo KNOCKD_INSTALLED'
|
||||
```
|
||||
Expected: `KNOCKD_INSTALLED`.
|
||||
|
||||
- [ ] **Step 2: Write knockd.conf with the Vault knock sequence (UDP)**
|
||||
|
||||
```bash
|
||||
KNOCK="$(vault kv get -field=breakglass_knock_sequence secret/viktor)" # e.g. 28411,49027,33180
|
||||
read K1 K2 K3 <<<"$(echo "$KNOCK" | tr ',' ' ')"
|
||||
ssh root@192.168.1.127 "cat > /etc/knockd.conf" <<EOF
|
||||
[options]
|
||||
UseSyslog
|
||||
Interface = vmbr0
|
||||
|
||||
[breakglass]
|
||||
sequence = ${K1}:udp,${K2}:udp,${K3}:udp
|
||||
seq_timeout = 10
|
||||
start_command = /usr/sbin/iptables -I INPUT 1 -s %IP% -p tcp --dport 52222 -j ACCEPT
|
||||
cmd_timeout = 30
|
||||
stop_command = /usr/sbin/iptables -D INPUT -s %IP% -p tcp --dport 52222 -j ACCEPT
|
||||
EOF
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Enable + start knockd**
|
||||
|
||||
```bash
|
||||
ssh root@192.168.1.127 "sed -i 's/^START_KNOCKD=.*/START_KNOCKD=1/' /etc/default/knockd 2>/dev/null || echo 'START_KNOCKD=1' >> /etc/default/knockd"
|
||||
ssh root@192.168.1.127 'systemctl enable --now knockd && systemctl is-active knockd'
|
||||
```
|
||||
Expected: `active`.
|
||||
|
||||
### Task 1.6: fail2ban (defense-in-depth)
|
||||
|
||||
- [ ] **Step 1: Install + enable fail2ban with the default sshd jail**
|
||||
|
||||
```bash
|
||||
ssh root@192.168.1.127 'apt-get install -y fail2ban && systemctl enable --now fail2ban && fail2ban-client status sshd >/dev/null && echo F2B_OK'
|
||||
```
|
||||
Expected: `F2B_OK` (sshd jail active).
|
||||
|
||||
---
|
||||
|
||||
## Phase 2 — Edge router `.1` forwards (LIVE router change — Viktor executes)
|
||||
|
||||
> In the AX6000 UI: **Advanced → NAT Forwarding → Port Forwarding → Add**. Do NOT remove anything yet.
|
||||
|
||||
- [ ] **Step 1: Add the SSH break-glass forward**
|
||||
- Name `breakglass-ssh`, External Port `52222`, Internal IP `192.168.1.127`, Internal Port `52222`, Protocol `TCP`, Enable.
|
||||
|
||||
- [ ] **Step 2: Add the three UDP knock forwards** (values from `vault kv get -field=breakglass_knock_sequence secret/viktor`)
|
||||
- For each of the 3 ports: Name `bg-knock-N`, External Port `<port>`, Internal IP `192.168.1.127`, Internal Port `<same port>`, Protocol `UDP`, Enable.
|
||||
|
||||
- [ ] **Step 3: (verify #1) Determine whether `.1` preserves source IP or SNATs**
|
||||
|
||||
After Phase 3 connects once, on the host check the observed source:
|
||||
```bash
|
||||
ssh root@192.168.1.127 'journalctl -u knockd -n 20 --no-pager | grep -i "stage\|open"'
|
||||
```
|
||||
If `%IP%` is a public IP → source preserved (per-IP granularity). If it's `192.168.1.1` → `.1` SNATs (knock opens `:52222` for the shared `.1` source during the 30 s window). Both are acceptable with the dual-port + key-only model; just note it in the runbook.
|
||||
|
||||
---
|
||||
|
||||
## Phase 3 — Client config (laptop, no live infra change)
|
||||
|
||||
**Files:** Modify `~/.ssh/config`; add a shell function to `~/.zshrc`/`~/.bashrc`.
|
||||
|
||||
- [ ] **Step 1: Add the SSH host block**
|
||||
|
||||
```bash
|
||||
cat >> ~/.ssh/config <<'EOF'
|
||||
|
||||
Host breakglass
|
||||
HostName viktorbarzin.ddns.net
|
||||
Port 52222
|
||||
User root
|
||||
IdentityFile ~/.ssh/breakglass_ed25519
|
||||
EOF
|
||||
```
|
||||
(`viktorbarzin.ddns.net` is the router's NO-IP DDNS name — follows the dynamic WAN IP. Raw IP `176.12.22.76` is the fallback.)
|
||||
|
||||
- [ ] **Step 2: Add the knock+connect function**
|
||||
|
||||
```bash
|
||||
cat >> ~/.zshrc <<'EOF'
|
||||
|
||||
bg() {
|
||||
local host="viktorbarzin.ddns.net"
|
||||
local seq; seq="$(vault kv get -field=breakglass_knock_sequence secret/viktor 2>/dev/null || echo "")"
|
||||
[ -z "$seq" ] && { echo "no knock sequence (vault?)"; return 1; }
|
||||
for p in ${seq//,/ }; do (exec 3<>/dev/udp/$host/$p) 2>/dev/null && echo "x" >&3; sleep 0.4; done
|
||||
sleep 0.5
|
||||
ssh breakglass "$@"
|
||||
}
|
||||
EOF
|
||||
```
|
||||
> Note: the bash `/dev/udp` redirection works under bash (`/bin/bash` on macOS + Linux). Under zsh, `/dev/udp` is also supported by zsh's builtin in recent versions; if your zsh build lacks it, define `bg` in bash or use `nc -u -w1 $host $p </dev/null`.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4-pre — Verify break-glass END-TO-END (gates Phase 4)
|
||||
|
||||
> Do this from an **external** network (phone hotspot / tethered), NOT the home LAN.
|
||||
|
||||
- [ ] **Step 1: Without knocking, the port is silent**
|
||||
|
||||
```bash
|
||||
nc -z -w3 viktorbarzin.ddns.net 52222 && echo "OPEN(bad)" || echo "SILENT_OK"
|
||||
```
|
||||
Expected: `SILENT_OK`.
|
||||
|
||||
- [ ] **Step 2: Knock + connect succeeds**
|
||||
|
||||
```bash
|
||||
bg 'hostname; echo BREAKGLASS_E2E_OK'
|
||||
```
|
||||
Expected: the PVE hostname + `BREAKGLASS_E2E_OK`.
|
||||
|
||||
- [ ] **Step 3: Full-LAN reach via the jump (no extra install)**
|
||||
|
||||
```bash
|
||||
ssh -J breakglass root@10.0.20.1 'echo PFSENSE_REACHED' 2>/dev/null || echo "check pfSense ssh"
|
||||
ssh -J breakglass admin@192.168.1.13 'echo SYNOLOGY_REACHED' 2>/dev/null || echo "check synology ssh"
|
||||
```
|
||||
Expected: confirms you can reach pfSense + Synology *through* break-glass (so closing Rule 6 loses nothing).
|
||||
|
||||
- [ ] **Step 4: LAN admin unaffected**
|
||||
|
||||
From the home LAN: `ssh -p 22 root@192.168.1.127 'echo LAN22_OK'` → `LAN22_OK`.
|
||||
|
||||
**GATE:** Only proceed to Phase 4 once Steps 1–4 pass. If any fail, fix before removing the legacy forward.
|
||||
|
||||
---
|
||||
|
||||
## Phase 5 — Router cleanup (LIVE router change — Viktor executes, AFTER Phase 4-pre passes)
|
||||
|
||||
> AX6000 UI. One pass, all three changes.
|
||||
|
||||
- [ ] **Step 1: Remove the Synology SSH exposure (Rule 6)**
|
||||
- Advanced → NAT Forwarding → Port Forwarding → delete (or disable) rule **`HTTP` / 3333 → 192.168.1.13:22**.
|
||||
|
||||
- [ ] **Step 2: Delete the stale Proxmox rule (Rule 3)**
|
||||
- Delete the disabled rule **`proxmox` / 8006 → 192.168.1.127**.
|
||||
|
||||
- [ ] **Step 3: Disable UPnP**
|
||||
- Advanced → NAT Forwarding → UPnP → toggle **OFF**. (Tailscale on `.101` falls back to DERP relay; the `41643→pfSense` mapping drops.)
|
||||
|
||||
- [ ] **Step 4: Verify the Synology SSH is gone from the WAN, break-glass still works**
|
||||
|
||||
From an external network:
|
||||
```bash
|
||||
nc -z -w3 viktorbarzin.ddns.net 3333 && echo "STILL_OPEN(bad)" || echo "SYNOLOGY_SSH_CLOSED_OK"
|
||||
bg 'echo BREAKGLASS_STILL_OK'
|
||||
```
|
||||
Expected: `SYNOLOGY_SSH_CLOSED_OK` and `BREAKGLASS_STILL_OK`.
|
||||
|
||||
---
|
||||
|
||||
## Phase 6 — Docs + commit (AFTER infra repo is clean)
|
||||
|
||||
- [ ] **Step 1: Update `docs/architecture/vpn.md`** — add a "Break-glass SSH" section (knock-gated SSH to PVE host, client `bg()`, cheat-sheet IPs).
|
||||
- [ ] **Step 2: Update `docs/architecture/security.md` + the Wave-1 note in `infra/.claude/CLAUDE.md`** — record the deliberate knock-gated exception; **correct the WAN-exposure inventory** (actual `.1` forwards are qbittorrent/stun/turn→pfSense + the new break-glass; Synology SSH removed; UPnP disabled; Remote Management off).
|
||||
- [ ] **Step 3: New runbook `docs/runbooks/breakglass-ssh.md`** — connect procedure, knock/key rotation, re-adding `.1` forwards after a router reset.
|
||||
- [ ] **Step 4: Commit the design + plan + doc updates** (only once Viktor confirms the repo is committable):
|
||||
|
||||
```bash
|
||||
git -C /home/wizard/code/infra add \
|
||||
docs/plans/2026-05-30-breakglass-ssh-access-design.md \
|
||||
docs/plans/2026-05-30-breakglass-ssh-access-plan.md \
|
||||
docs/architecture/vpn.md docs/architecture/security.md \
|
||||
docs/runbooks/breakglass-ssh.md .claude/CLAUDE.md
|
||||
git -C /home/wizard/code/infra commit -m "docs+feat: break-glass knock-gated SSH; retire Synology SSH forward; disable UPnP [ci skip]"
|
||||
git -C /home/wizard/code/infra push origin master
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Self-review
|
||||
|
||||
- **Spec coverage:** key-only SSH ✅ (1.3), knock gate ✅ (1.4/1.5), invisibility ✅ (4-pre.1), full-LAN via jump ✅ (4-pre.3), no-lockout ✅ (1.1/1.3.4), Wave-1 exception doc ✅ (6.2), close legacy SSH ✅ (5.1), UPnP ✅ (5.3). All design §sections map to a task.
|
||||
- **Placeholder scan:** no TBDs; secret values are generated + Vault-stored, referenced via `vault kv get` (concrete, not placeholders).
|
||||
- **Consistency:** port `52222`, knock from `secret/viktor/breakglass_knock_sequence`, key `~/.ssh/breakglass_ed25519`, host `192.168.1.127` used consistently throughout.
|
||||
- **Open verify items** (flagged inline, non-blocking): #1 `.1` SNAT behaviour (2.3), pve-firewall coexistence (1.1.2).
|
||||
73
docs/plans/2026-06-11-breakglass-ssh-redesign-design.md
Normal file
73
docs/plans/2026-06-11-breakglass-ssh-redesign-design.md
Normal file
|
|
@ -0,0 +1,73 @@
|
|||
# Break-glass SSH — Redesign
|
||||
|
||||
- **Date**: 2026-06-11
|
||||
- **Status**: Implemented
|
||||
- **Owner**: Viktor
|
||||
- **Supersedes**: `2026-05-30-breakglass-ssh-access-{design,plan}.md` (port-knock design)
|
||||
- **As-built runbook**: `docs/runbooks/breakglass-ssh.md`
|
||||
|
||||
## Why redesign
|
||||
|
||||
The 2026-05-30 design gated a key-only SSH port on the Proxmox host behind a UDP
|
||||
**port-knock** (knockd). It caused a real lockout, for a structural reason:
|
||||
|
||||
- The knock sequence was 3 random ports stored **only** in Vault, and the client
|
||||
helper fetched it from Vault at connect time.
|
||||
- **Vault is in-cluster** and not publicly reachable (Wave-1 policy). In the
|
||||
exact scenario break-glass exists for — away from home, cluster/tunnels down —
|
||||
the knock sequence is unreachable and unmemorable. Circular dependency.
|
||||
|
||||
The knock's only benefit was hiding an already brute-force-proof port; its cost
|
||||
was that fragility. For a *recovery* path, robustness beats stealth.
|
||||
|
||||
## Decision
|
||||
|
||||
**Plain key-only SSH to the Proxmox host on `:52222`, openly reachable, no knock.**
|
||||
Hardened with: the exposed port trusts only a dedicated break-glass key
|
||||
(`Match LocalPort`), per-source connection rate-limiting (iptables hashlimit),
|
||||
and fail2ban. Scenario covered: *cluster + tunnels down, host + pfSense + router
|
||||
up* (the common "I'm away and need in" case — confirmed with Viktor; deeper
|
||||
"pfSense wedged" / "host down" tiers are explicitly out of scope).
|
||||
|
||||
Alternatives considered and rejected: keeping the knock (fragile, circular);
|
||||
Tailscale-on-pfSense (briefly chosen, then dropped — reintroduces the upstream
|
||||
dependency Headscale is self-hosted to avoid, and the user preferred a
|
||||
self-contained stock-ssh path); WireGuard road-warrior (needs a client, and the
|
||||
self-contained SSH path was preferred).
|
||||
|
||||
## Components
|
||||
|
||||
| Layer | Change | Source of truth |
|
||||
|---|---|---|
|
||||
| sshd | dual-port `:22` (LAN, all keys) + `:52222` (WAN, break-glass key only via `Match LocalPort`, terminated by `Match all`); key-only everywhere | `scripts/sshd-10-breakglass.conf` |
|
||||
| host firewall | `BREAKGLASS` chain: `:52222` rate-limited per source, LAN bypass; replaced the knock-gated default-DROP | `scripts/breakglass-firewall.sh` (+ `breakglass-firewall.service`) |
|
||||
| fail2ban | jail fixed for Debian 13 (`journalmatch` by unit, not `_COMM=sshd`, else it never bans), bans on `:22`+`:52222` | `scripts/fail2ban-breakglass-sshd.local` |
|
||||
| knockd | **removed** (package purged, config deleted) | — |
|
||||
| edge router | `breakglass-ssh` WAN tcp/52222 → 192.168.1.127:52222; **removed** legacy Synology SSH forward (ext 3333 → .13:22) | manual (live device) |
|
||||
| Vault | `breakglass_ssh_{pub,priv}key` retained; `breakglass_knock_sequence` now dead | `secret/viktor` |
|
||||
|
||||
## Edge-router constraints discovered (TP-Link AX6000)
|
||||
|
||||
- **No port remapping** — external port must equal internal port (rejects e.g.
|
||||
`22 → 52222` as a "conflict"). All forwards are ext==int; hence `:52222` both
|
||||
sides.
|
||||
- **Port 22 is reserved** — `22 → 22` is also refused. Break-glass cannot use 22
|
||||
(Viktor's initial preference); `:52222` is the landed port.
|
||||
- **Row delete is immediate** (no confirm dialog).
|
||||
|
||||
## Security posture
|
||||
|
||||
- **Brute force: impossible** (key-only, no password).
|
||||
- **Scannable: yes** — deliberate, documented Wave-1 exception (`security.md`).
|
||||
- **Residual risks:** sshd 0-day during exposure (mitigate: patch, rate-limit,
|
||||
fail2ban, low MaxAuthTries); break-glass key theft (revoke by removing the
|
||||
`authorized_keys.breakglass` line). Logins are audited (PVE ships sshd auth +
|
||||
snoopy execve to Loki).
|
||||
|
||||
## Verification (2026-06-11)
|
||||
|
||||
- `:52222` reachable; break-glass key authenticates (`root@pve`).
|
||||
- Non-break-glass keys **rejected** on `:52222` (Match isolation works).
|
||||
- `:22` LAN admin unaffected (Match all reset confirmed — global root login intact).
|
||||
- Full WAN path: `ssh -p 52222 <WAN-IP>` with the break-glass key → `root@pve`.
|
||||
- knockd gone; fail2ban jail matches Debian 13 `sshd-session` lines.
|
||||
158
docs/runbooks/breakglass-ssh.md
Normal file
158
docs/runbooks/breakglass-ssh.md
Normal file
|
|
@ -0,0 +1,158 @@
|
|||
# Runbook: Break-glass SSH
|
||||
|
||||
Cold-survivable, brute-force-proof SSH onto the home LAN for when the Kubernetes
|
||||
cluster and its remote-access tunnels (Headscale, cloudflared) are down but the
|
||||
**Proxmox host + edge router are up**. Redesigned 2026-06-11 — the previous
|
||||
port-knock design is decommissioned (see "History" below).
|
||||
|
||||
## Model (as built)
|
||||
|
||||
```
|
||||
your laptop (anywhere) ── ssh -p 52222 ──▶ edge router 192.168.1.1
|
||||
│ WAN tcp/52222 ─▶ 192.168.1.127:52222
|
||||
▼
|
||||
Proxmox host 192.168.1.127
|
||||
sshd :52222 (key-only, break-glass key ONLY)
|
||||
→ full LAN via ssh -J / ssh -D
|
||||
```
|
||||
|
||||
- **No port-knock.** Plain `ssh -p 52222`. The SSH key is the only gate.
|
||||
- **Key-only**, brute-force-proof. The exposed `:52222` trusts **only** the
|
||||
dedicated break-glass key (`/root/.ssh/authorized_keys.breakglass`), separate
|
||||
from root's normal LAN-admin keys, so it is independently revocable and a leak
|
||||
of any other root key does not grant internet access.
|
||||
- **Rate-limited** per source IP (iptables hashlimit) + **fail2ban**. These trim
|
||||
scanner noise only; key-only auth is the real protection.
|
||||
- **Exposed, not hidden.** `:52222` answers on the WAN (Shodan-visible). This is
|
||||
a deliberate, documented exception to the Wave-1 "no public-IP access" policy
|
||||
(see `docs/architecture/security.md`), chosen for self-containment: it has **no
|
||||
dependency on the cluster** (unlike Headscale/cloudflared) and nothing to
|
||||
remember (unlike the old knock, whose sequence lived only in in-cluster Vault).
|
||||
|
||||
## Secrets (Vault `secret/viktor`)
|
||||
|
||||
| Key | Use |
|
||||
|---|---|
|
||||
| `breakglass_ssh_pubkey` | authorized on the host (`authorized_keys.breakglass`) |
|
||||
| `breakglass_ssh_privkey` | the private key (also on your laptop at `~/.ssh/breakglass_ed25519`) |
|
||||
|
||||
The key has **no passphrase** (so it works in a true cold event without anything
|
||||
to recall). Treat the private key as the sole credential — guard the laptop copy.
|
||||
|
||||
> Leftover: `breakglass_knock_sequence` is dead (knock decommissioned). It is
|
||||
> inert; remove it when you have a Vault token with the `patch` capability
|
||||
> (`vault kv patch` / merge-patch — the everyday token lacks it).
|
||||
|
||||
## Connect
|
||||
|
||||
Client `~/.ssh/config`:
|
||||
|
||||
```
|
||||
Host breakglass
|
||||
HostName viktorbarzin.ddns.net # follows the dynamic WAN IP
|
||||
Port 52222
|
||||
User root
|
||||
IdentityFile ~/.ssh/breakglass_ed25519
|
||||
IdentitiesOnly yes
|
||||
```
|
||||
|
||||
Then:
|
||||
|
||||
```bash
|
||||
ssh breakglass # shell on the Proxmox host
|
||||
ssh -J breakglass root@10.0.20.1 # jump to pfSense (or any LAN host)
|
||||
ssh -D 1080 breakglass # SOCKS5 → reach any internal IP
|
||||
```
|
||||
|
||||
There is **no `bg()` knock function** anymore — delete it from your shell rc if
|
||||
you added it under the old design.
|
||||
|
||||
## Cold-event IP cheat sheet (cluster DNS is down)
|
||||
|
||||
| Host | IP |
|
||||
|---|---|
|
||||
| Proxmox host | `192.168.1.127` |
|
||||
| pfSense | `10.0.20.1` (WAN `192.168.1.2`) |
|
||||
| k8s API | `10.0.20.100` |
|
||||
| Synology NAS | `192.168.1.13` (reach via `ssh -J breakglass`) |
|
||||
| edge router | `192.168.1.1` |
|
||||
|
||||
## Deploy / re-provision the host config
|
||||
|
||||
Source of truth lives in `infra/scripts/`. To (re)deploy:
|
||||
|
||||
```bash
|
||||
# 1. break-glass key authorized for the exposed port
|
||||
PUB="$(vault kv get -field=breakglass_ssh_pubkey secret/viktor)"
|
||||
ssh root@192.168.1.127 "printf '%s\n' '$PUB' > /root/.ssh/authorized_keys.breakglass && chmod 600 /root/.ssh/authorized_keys.breakglass"
|
||||
|
||||
# 2. sshd drop-in (dual-port, Match-isolated) — validate before reload (anti-lockout)
|
||||
scp scripts/sshd-10-breakglass.conf root@192.168.1.127:/etc/ssh/sshd_config.d/10-breakglass.conf
|
||||
ssh root@192.168.1.127 'sshd -t && systemctl reload ssh'
|
||||
|
||||
# 3. firewall (rate-limit) + boot unit
|
||||
scp scripts/breakglass-firewall.sh root@192.168.1.127:/usr/local/sbin/breakglass-firewall.sh
|
||||
ssh root@192.168.1.127 'chmod 0755 /usr/local/sbin/breakglass-firewall.sh && systemctl enable --now breakglass-firewall.service'
|
||||
|
||||
# 4. fail2ban jail
|
||||
scp scripts/fail2ban-breakglass-sshd.local root@192.168.1.127:/etc/fail2ban/jail.d/breakglass-sshd.local
|
||||
ssh root@192.168.1.127 'systemctl restart fail2ban && fail2ban-client status sshd'
|
||||
```
|
||||
|
||||
The `breakglass-firewall.service` unit (oneshot, `RemainAfterExit=yes`,
|
||||
`Before=network-online`-ish ordering) is a manual host unit — recreate it if the
|
||||
host is rebuilt:
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Break-glass base firewall (key-only SSH on :52222)
|
||||
After=network-pre.target
|
||||
Wants=network-pre.target
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=/usr/local/sbin/breakglass-firewall.sh
|
||||
RemainAfterExit=yes
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
## Edge-router forward (manual — live device, not Terraform)
|
||||
|
||||
TP-Link Archer AX6000 (`192.168.1.1`) → Advanced → NAT Forwarding → Port
|
||||
Forwarding. The break-glass rule:
|
||||
|
||||
| Service Name | Device IP | External Port | Internal Port | Protocol |
|
||||
|---|---|---|---|---|
|
||||
| `breakglass-ssh` | `192.168.1.127` | `52222` | `52222` | TCP |
|
||||
|
||||
**AX6000 quirks (learned 2026-06-11 — do not relearn the hard way):**
|
||||
- **External port must equal internal port.** The firmware rejects any remap
|
||||
(e.g. `22 → 52222`) with *"External Port: This item conflicts with existed
|
||||
ones."* Hence ext==int 52222.
|
||||
- **Port 22 is reserved** — even `22 → 22` is refused. Break-glass cannot use 22.
|
||||
- **Row delete is immediate** (no confirm dialog) — clicking the trash icon
|
||||
removes the rule and toasts "Operation succeeded".
|
||||
- Automation: `~/wizard/tools/insecure-browse/add-forward.{sh,js}` (dockerized
|
||||
Playwright; double-gated save `DRY_RUN=0 CONFIRM_SAVE=1`; supports
|
||||
`RULES_JSON` add, `EDIT_RULES_JSON` protocol-edit, `DELETE_RULES_JSON`
|
||||
identity-guarded delete). Router password: Vault
|
||||
`secret/viktor/edge_router_192_168_1_1_password`.
|
||||
|
||||
## Rotate / revoke
|
||||
|
||||
- **Revoke instantly:** remove the line from `/root/.ssh/authorized_keys.breakglass`.
|
||||
- **Rotate the key:** `ssh-keygen -t ed25519 -a 100 -f ~/.ssh/breakglass_ed25519`,
|
||||
`vault kv patch secret/viktor breakglass_ssh_privkey=@... breakglass_ssh_pubkey=...`,
|
||||
redeploy step 1 above.
|
||||
- **Router reset wipes forwards:** re-add the `breakglass-ssh` rule above.
|
||||
|
||||
## History
|
||||
|
||||
- **2026-05-30:** original design — key-only SSH on `:52222` gated behind a
|
||||
**UDP port-knock** (knockd). Decommissioned 2026-06-11: the knock added no real
|
||||
security (the SSH key already makes the port brute-force-proof) and its only
|
||||
benefit — hiding the port — came at the cost of a **circular dependency**: the
|
||||
knock sequence lived only in in-cluster Vault, unreachable in the exact
|
||||
cold/away scenario break-glass exists for. That caused a real lockout. The
|
||||
knockd package + config + the legacy Synology SSH forward (ext 3333 → .13:22)
|
||||
were removed.
|
||||
Loading…
Add table
Add a link
Reference in a new issue