break-glass SSH: drop port-knock for exposed key-only :52222; version host config
Viktor got locked out of the break-glass path (forgot the port-knock setup) and deleted the edge-router forwards, then asked to review and redesign it from scratch. Root cause of the lockout: the knock added no real security (key-only SSH is already brute-force-proof) and its only benefit — hiding the port — came at the cost of a circular dependency. The knock sequence lived only in in-cluster Vault, which is unreachable in the exact away/cold scenario break-glass exists for. So the unlock secret was unavailable precisely when needed. New model (self-contained, nothing to remember): plain key-only SSH on the Proxmox host's :52222, openly reachable. The edge router forwards WAN tcp/52222 -> 192.168.1.127:52222 (external port MUST equal internal on the TP-Link AX6000 - it rejects remaps; port 22 itself is reserved). The exposed port trusts only a dedicated break-glass key via `Match LocalPort` (a leak of any other root key does not grant internet access), rate-limited (iptables hashlimit) + fail2ban. - Removed knockd (package + config) and the legacy Synology SSH forward (ext 3333 -> .13:22, a needless WAN exposure the original plan wanted gone). - Fixed the fail2ban jail for Debian 13 (auth logs under sshd-session, not sshd - the stock journalmatch silently never banned). - Versioned the host config in scripts/ (it was applied ad-hoc, never committed) and recorded the deliberate Wave-1 "no public-IP" exception in security.md + .claude/CLAUDE.md. Superseded the 2026-05-30 port-knock design docs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
e2788d1b2d
commit
df332b59e6
9 changed files with 989 additions and 1 deletions
|
|
@ -178,7 +178,7 @@ Repo IDs: infra=1, Website=2, finance=3, health=4, travel_blog=5, webhook-handle
|
||||||
Plan in `docs/architecture/security.md` + response playbook in `docs/runbooks/security-incident.md`. Beads epic: `code-8ywc`.
|
Plan in `docs/architecture/security.md` + response playbook in `docs/runbooks/security-incident.md`. Beads epic: `code-8ywc`.
|
||||||
|
|
||||||
- **Identity allowlist for security rules**: ONLY `me@viktorbarzin.me`. NOT `viktor@viktorbarzin.me`, NOT `emo@viktorbarzin.me` (those don't exist). emo's identity scheme is unknown — ask before assuming.
|
- **Identity allowlist for security rules**: ONLY `me@viktorbarzin.me`. NOT `viktor@viktorbarzin.me`, NOT `emo@viktorbarzin.me` (those don't exist). emo's identity scheme is unknown — ask before assuming.
|
||||||
- **Source-IP allowlist (K2, K9, V7, S1)**: `10.0.20.0/22`, `192.168.1.0/24` (Proxmox + Sofia LAN), K8s pod CIDR, K8s service CIDR, Headscale tailnet. **Policy: no public-IP access** — Vault, kube-apiserver, PVE sshd must transit LAN or Headscale.
|
- **Source-IP allowlist (K2, K9, V7, S1)**: `10.0.20.0/22`, `192.168.1.0/24` (Proxmox + Sofia LAN), K8s pod CIDR, K8s service CIDR, Headscale tailnet. **Policy: no public-IP access** — Vault, kube-apiserver, PVE sshd must transit LAN or Headscale. **One documented exception (2026-06-11): break-glass SSH** — PVE sshd on a WAN-exposed `:52222`, key-only, dedicated break-glass key only (`Match LocalPort`), rate-limited + fail2ban; intentionally cluster-independent so it survives an outage. As-built `docs/runbooks/breakglass-ssh.md`. (Replaced the 2026-05-30 port-knock design — circular Vault dep caused a lockout.)
|
||||||
- **Response model**: (I) Slack-only daily skim. All security alerts via Loki ruler → Alertmanager → `#security` Slack receiver. Single channel with severity labels inside (critical/warning/info). No paging.
|
- **Response model**: (I) Slack-only daily skim. All security alerts via Loki ruler → Alertmanager → `#security` Slack receiver. Single channel with severity labels inside (critical/warning/info). No paging.
|
||||||
- **Kyverno policies (wave 1)**: `deny-privileged-containers`, `deny-host-namespaces`, `restrict-sys-admin`, `require-trusted-registries` flip Audit→Enforce with the 31-namespace exclude list (memory id=1970). `failurePolicy: Ignore` preserved. Cosign `verify-images` deferred.
|
- **Kyverno policies (wave 1)**: `deny-privileged-containers`, `deny-host-namespaces`, `restrict-sys-admin`, `require-trusted-registries` flip Audit→Enforce with the 31-namespace exclude list (memory id=1970). `failurePolicy: Ignore` preserved. Cosign `verify-images` deferred.
|
||||||
- **NetworkPolicy default-deny egress (wave 1)**: observe-then-enforce (γ approach) — Calico flow logs cluster-wide + GlobalNetworkPolicy log-only on tier 3+4, build empirical allowlist after 1 week, phased per-namespace enforce starting `recruiter-responder`. Tier 0/1/2 deferred.
|
- **NetworkPolicy default-deny egress (wave 1)**: observe-then-enforce (γ approach) — Calico flow logs cluster-wide + GlobalNetworkPolicy log-only on tier 3+4, build empirical allowlist after 1 week, phased per-namespace enforce starting `recruiter-responder`. Tier 0/1/2 deferred.
|
||||||
|
|
|
||||||
|
|
@ -255,6 +255,8 @@ Routed via **Loki ruler → Alertmanager → `#security` Slack receiver**. Same
|
||||||
|
|
||||||
**Policy: no public-IP access ever.** Vault, kube-apiserver, PVE sshd must transit a trusted LAN or Headscale. Anything else fires an alert.
|
**Policy: no public-IP access ever.** Vault, kube-apiserver, PVE sshd must transit a trusted LAN or Headscale. Anything else fires an alert.
|
||||||
|
|
||||||
|
**Documented exception — break-glass SSH (2026-06-11):** one deliberate carve-out. The Proxmox host's sshd listens on a WAN-exposed `:52222` (edge-router forward), **key-only**, trusting only a dedicated break-glass key (`Match LocalPort` → `authorized_keys.breakglass`), rate-limited (iptables hashlimit) + fail2ban. It is intentionally reachable from the public internet so it survives a cluster/tunnel outage with no dependency on the cluster — the one case the "must transit LAN/Headscale" rule cannot serve. Brute-force-proof (no password); the trade is Shodan-visibility. As-built: `docs/runbooks/breakglass-ssh.md`; rationale: `docs/plans/2026-06-11-breakglass-ssh-redesign-design.md`. (Replaced the 2026-05-30 port-knock variant, which was non-scannable but had a circular Vault dependency that caused a lockout.)
|
||||||
|
|
||||||
#### Why no canary tokens
|
#### Why no canary tokens
|
||||||
|
|
||||||
Original plan included canary tokens (fake K8s Secret, Vault KV path, PVE file, sinkhole hostname). Rejected because Viktor routinely greps `secret/viktor` (135 keys) and lists `kubectl get secret -A` — any read-trigger canary self-fires. Use-based canaries (zero-RBAC SA tokens with audit alerts on use) were also considered but rejected in favor of cleaner source-IP anomaly detection (K9, V7) on REAL tokens — same threat model, no fake-token operational burden.
|
Original plan included canary tokens (fake K8s Secret, Vault KV path, PVE file, sinkhole hostname). Rejected because Viktor routinely greps `secret/viktor` (135 keys) and lists `kubectl get secret -A` — any read-trigger canary self-fires. Use-based canaries (zero-RBAC SA tokens with audit alerts on use) were also considered but rejected in favor of cleaner source-IP anomaly detection (K9, V7) on REAL tokens — same threat model, no fake-token operational burden.
|
||||||
|
|
|
||||||
285
docs/plans/2026-05-30-breakglass-ssh-access-design.md
Normal file
285
docs/plans/2026-05-30-breakglass-ssh-access-design.md
Normal file
|
|
@ -0,0 +1,285 @@
|
||||||
|
# Break-Glass SSH Access — Design
|
||||||
|
|
||||||
|
> **⚠️ SUPERSEDED 2026-06-11** by `2026-06-11-breakglass-ssh-redesign-design.md`.
|
||||||
|
> The port-knock was removed: it added no real security (the SSH key already
|
||||||
|
> makes the port brute-force-proof) and its knock sequence lived only in
|
||||||
|
> in-cluster Vault — unreachable in the exact cold/away scenario break-glass
|
||||||
|
> exists for, which caused a real lockout. Retained for history. As-built:
|
||||||
|
> `docs/runbooks/breakglass-ssh.md`.
|
||||||
|
|
||||||
|
- **Date**: 2026-05-30
|
||||||
|
- **Status**: Draft — pending user review
|
||||||
|
- **Owner**: Viktor
|
||||||
|
- **Related**: `docs/architecture/vpn.md`, `docs/architecture/security.md`, `infra/.claude/CLAUDE.md` (Security Posture Wave 1)
|
||||||
|
|
||||||
|
## 1. Goal
|
||||||
|
|
||||||
|
Provide a **cold, brute-force-proof backdoor onto the home LAN from the public
|
||||||
|
internet** for the case where the Kubernetes cluster and every cluster-hosted
|
||||||
|
remote-access path are down (cloudflared, Headscale/Tailscale, in-cluster
|
||||||
|
WireGuard), but the **Proxmox host, pfSense, and the edge router are still up**.
|
||||||
|
|
||||||
|
### Hard requirements (from the user)
|
||||||
|
|
||||||
|
1. **Cold-survivable**: must work when the k8s cluster + all its tunnels are
|
||||||
|
down. The path must touch **nothing in the cluster** (no Authentik, Traefik,
|
||||||
|
Technitium/AdGuard DNS, cloudflared).
|
||||||
|
2. **Full LAN access** once connected (SSH to Proxmox host, pfSense, Synology,
|
||||||
|
k8s API, etc.).
|
||||||
|
3. **No brute force**: no password-guessable surface.
|
||||||
|
4. **Client uses only software pre-installed on Linux/macOS** — no WireGuard /
|
||||||
|
Tailscale / fwknop client install. Stock `ssh` (+ `bash`) only.
|
||||||
|
5. **Minimal effort**, and ideally **honor the locked Wave 1 policy**
|
||||||
|
(`no public-IP access — … PVE sshd must transit LAN or Headscale`).
|
||||||
|
|
||||||
|
## 2. Decision
|
||||||
|
|
||||||
|
**Key-only SSH to the Proxmox host, gated behind a UDP port-knock.**
|
||||||
|
|
||||||
|
- The Proxmox host (`192.168.1.127`) is the entry point — it's the recovery box
|
||||||
|
(`virsh`/`qm` to reboot the pfSense VM, `kubectl`, full hypervisor control)
|
||||||
|
and it sits directly on the `192.168.1.0/24` segment, so the path **does not
|
||||||
|
traverse pfSense or the cluster** — it survives a wedged pfSense too, not just
|
||||||
|
a down cluster.
|
||||||
|
- SSH is the only externally-usable remote tool **pre-installed on every
|
||||||
|
Linux/macOS box**, satisfying requirement 4.
|
||||||
|
- **Key-only auth** (no passwords anywhere) makes password brute force
|
||||||
|
impossible → requirement 3.
|
||||||
|
- A **port-knock** keeps the external SSH port **closed/invisible to scanners**
|
||||||
|
until a knock sequence is sent. This restores the "no standing public service"
|
||||||
|
property we'd have had with WireGuard and keeps us within the **intent** of the
|
||||||
|
Wave 1 policy (PVE sshd is not internet-scannable). The knock is sent with a
|
||||||
|
**bash `/dev/udp` one-liner** — zero install.
|
||||||
|
|
||||||
|
### Alternatives rejected
|
||||||
|
|
||||||
|
| Option | Why rejected |
|
||||||
|
|---|---|
|
||||||
|
| WireGuard road-warrior on pfSense | Needs a WireGuard **client app** (fails requirement 4). Was the prior design. |
|
||||||
|
| Tailscale / Headscale | Client app + control plane is in-cluster (dies cold). |
|
||||||
|
| Browser → web admin UI (Proxmox/pfSense/Synology) | "Pre-installed" (browser) but password-based → brute-forceable, far larger attack surface than a key-only SSH port. |
|
||||||
|
| Plain **exposed** key-only SSH (no knock) | Brute-force-proof, but a **publicly visible** service (Shodan-catalogued) and a standing violation of the Wave 1 "no public PVE sshd" policy. The knock removes the standing exposure for ~15 min more setup. |
|
||||||
|
| fwknop / cryptographic SPA | Strongest hiding, but needs a **client install** (fails requirement 4). |
|
||||||
|
|
||||||
|
## 3. Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
Your laptop (anywhere) — stock ssh + bash, nothing installed
|
||||||
|
│ (1) UDP knock sequence → bash: echo > /dev/udp/<pub>/<port> (instant, no handshake)
|
||||||
|
│ (2) ssh -p 52222 root@<pub>
|
||||||
|
▼
|
||||||
|
Edge router 192.168.1.1 (the box the stored password unlocks)
|
||||||
|
│ forwards: UDP <k1>,<k2>,<k3> + TCP 52222 → 192.168.1.127
|
||||||
|
▼
|
||||||
|
Proxmox host 192.168.1.127 ← path bypasses pfSense entirely
|
||||||
|
├─ knockd (libpcap) sees the UDP knock → opens TCP 52222 for your source IP (30 s)
|
||||||
|
├─ sshd listens on :22 (LAN admin, always) AND :52222 (external, knock-gated), key-only
|
||||||
|
└─ once in: virsh/qm (reboot pfSense VM), kubectl, ssh -J / ssh -D → full LAN
|
||||||
|
```
|
||||||
|
|
||||||
|
**Why it meets "cold + full LAN":** the host is up by definition of the chosen
|
||||||
|
failure mode; nothing in the path depends on k8s, pfSense, or DNS. From the host
|
||||||
|
you reach the whole LAN either directly (it's on `192.168.1.0/24` and routes to
|
||||||
|
the VLANs via pfSense when pfSense is up) or by using SSH's built-in
|
||||||
|
`-J`/`-D` — both stock, no install.
|
||||||
|
|
||||||
|
## 4. Components
|
||||||
|
|
||||||
|
### 4.1 Edge router @ 192.168.1.1 (manual, in the browser)
|
||||||
|
Add port-forwards (same place the existing `51821` WireGuard forward lives):
|
||||||
|
- **TCP 52222 → 192.168.1.127:52222** (external SSH; no port rewrite — see §4.3 rationale)
|
||||||
|
- **UDP `<k1>`, `<k2>`, `<k3>` → 192.168.1.127** (knock ports; actual numbers in Vault)
|
||||||
|
|
||||||
|
If the router supports a **port range** forward, a single range covering the
|
||||||
|
knock ports + 52222 is tidier than four rules.
|
||||||
|
|
||||||
|
> **Verify (#1 implementation check):** whether `.1` **preserves the source IP**
|
||||||
|
> on forwarded packets (typical DNAT) or **SNATs** them to `192.168.1.1`. Test by
|
||||||
|
> knocking + connecting from an external network and checking `/var/log/auth.log`
|
||||||
|
> + `knockd` syslog for the observed source IP. The design works either way (see
|
||||||
|
> §4.3), but it determines knock granularity.
|
||||||
|
|
||||||
|
### 4.2 SSH keys & Vault layout
|
||||||
|
- Mint a **dedicated** break-glass keypair (ed25519), separate from
|
||||||
|
`secret/viktor/proxmox_ssh_key`, so it's independently revocable and clearly
|
||||||
|
labelled.
|
||||||
|
- **Public key** → `/root/.ssh/authorized_keys` on the Proxmox host (no `from=`
|
||||||
|
restriction — break-glass is from-anywhere; the knock + key are the gate).
|
||||||
|
- **Private key** → Vault `secret/viktor/breakglass_ssh_privkey` (for
|
||||||
|
re-provisioning) **and** on your laptop at `~/.ssh/breakglass_ed25519`
|
||||||
|
(chmod 600).
|
||||||
|
- **Knock sequence** → Vault `secret/viktor/breakglass_knock_sequence` (kept out
|
||||||
|
of git — obscurity value only; see §5).
|
||||||
|
|
||||||
|
### 4.3 Proxmox host — sshd hardening
|
||||||
|
`/etc/ssh/sshd_config.d/10-breakglass.conf`:
|
||||||
|
```
|
||||||
|
Port 22
|
||||||
|
Port 52222
|
||||||
|
PasswordAuthentication no
|
||||||
|
KbdInteractiveAuthentication no
|
||||||
|
PubkeyAuthentication yes
|
||||||
|
PermitRootLogin prohibit-password # key-only root (PVE recovery norm)
|
||||||
|
MaxAuthTries 3
|
||||||
|
LoginGraceTime 20
|
||||||
|
```
|
||||||
|
- sshd listens on **:22 (LAN admin, always allowed)** and **:52222 (external,
|
||||||
|
knock-gated)**. Using a dedicated external port (not a DNAT rewrite to 22)
|
||||||
|
lets the firewall distinguish LAN vs external **regardless of `.1` SNAT
|
||||||
|
behaviour** (§4.1) — LAN admin on `:22` is never affected by the gate.
|
||||||
|
- **Default to root key-only** for recovery practicality. *Alternative for
|
||||||
|
review:* a dedicated `breakglass` sudo user instead of root.
|
||||||
|
|
||||||
|
> **Verify (#2):** key login already works for your normal access **before**
|
||||||
|
> `PasswordAuthentication no` is committed — no lockout. (Backup rsync jobs
|
||||||
|
> already use keys, so this is likely already effectively true.)
|
||||||
|
|
||||||
|
### 4.4 Host firewall (knock gate)
|
||||||
|
Default-drop the external SSH port; knockd punches a per-source hole. LAN admin
|
||||||
|
(`:22`) and established sessions are untouched:
|
||||||
|
```
|
||||||
|
# allow established / related
|
||||||
|
iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
|
||||||
|
# LAN admin + backups: SSH on :22 always allowed
|
||||||
|
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
|
||||||
|
# external SSH on :52222 closed by default — knockd opens it per-source
|
||||||
|
iptables -A INPUT -p tcp --dport 52222 -j DROP
|
||||||
|
```
|
||||||
|
- **knockd uses libpcap**, so it sees the UDP knock packets even though iptables
|
||||||
|
drops them — the knock ports stay **silent/closed** to scanners.
|
||||||
|
- **pve-firewall coexistence (verify #3):** confirm whether the PVE firewall is
|
||||||
|
enabled. If it is, express these rules through it (or a dedicated chain) so a
|
||||||
|
pve-firewall reload doesn't wipe the knockd-managed rule. Default PVE installs
|
||||||
|
often have it off at datacenter level.
|
||||||
|
|
||||||
|
### 4.5 knockd
|
||||||
|
`apt install knockd` (Debian/PVE). `/etc/knockd.conf`:
|
||||||
|
```
|
||||||
|
[options]
|
||||||
|
UseSyslog
|
||||||
|
Interface = vmbr0 # the 192.168.1.127 interface
|
||||||
|
|
||||||
|
[breakglass]
|
||||||
|
sequence = <k1>:udp,<k2>:udp,<k3>:udp # real ports from Vault
|
||||||
|
seq_timeout = 10
|
||||||
|
start_command = /usr/sbin/iptables -I INPUT 1 -s %IP% -p tcp --dport 52222 -j ACCEPT
|
||||||
|
cmd_timeout = 30
|
||||||
|
stop_command = /usr/sbin/iptables -D INPUT -s %IP% -p tcp --dport 52222 -j ACCEPT
|
||||||
|
```
|
||||||
|
- **UDP knock** → the client knock is fire-and-forget (`/dev/udp`), no TCP-hang
|
||||||
|
on the client (a TCP knock to a dropped port would block until timeout).
|
||||||
|
- Opens `:52222` for the knocker's source IP for **30 s**; an SSH session
|
||||||
|
established within that window **persists** via conntrack ESTABLISHED after the
|
||||||
|
rule is removed. Enable + start the `knockd` service.
|
||||||
|
|
||||||
|
### 4.6 fail2ban (defense-in-depth)
|
||||||
|
`apt install fail2ban`, sshd jail (watches `auth.log`, bans repeat failures).
|
||||||
|
Local to the host, **no cluster dependency**. Catches anything that gets past the
|
||||||
|
knock to the sshd listener.
|
||||||
|
|
||||||
|
### 4.7 Client side (laptop — stock tools only)
|
||||||
|
`~/.ssh/config`:
|
||||||
|
```
|
||||||
|
Host breakglass
|
||||||
|
HostName <public-ip-or-dyndns>
|
||||||
|
Port 52222
|
||||||
|
User root
|
||||||
|
IdentityFile ~/.ssh/breakglass_ed25519
|
||||||
|
```
|
||||||
|
Knock + connect — a shell function using **bash builtins only** (works on
|
||||||
|
macOS `/bin/bash` + Linux; UDP send is instant):
|
||||||
|
```sh
|
||||||
|
bg() {
|
||||||
|
local host=<public-ip-or-dyndns>
|
||||||
|
for p in <k1> <k2> <k3>; do echo -n x > "/dev/udp/$host/$p"; sleep 0.4; done
|
||||||
|
sleep 0.5
|
||||||
|
ssh breakglass "$@"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
- **Full LAN, no install:** `ssh -J breakglass <internal-host>` (jump), or
|
||||||
|
`ssh -D 1080 breakglass` then point a browser/`curl` at SOCKS5 `127.0.0.1:1080`
|
||||||
|
to reach any internal IP. From the host shell you already have everything.
|
||||||
|
- *Optional fully-transparent variant:* fold the knock into a `ProxyCommand` in
|
||||||
|
the `Host breakglass` block so plain `ssh breakglass` knocks automatically.
|
||||||
|
|
||||||
|
### 4.8 Cold-scenario IP cheat sheet (DNS is down when the cluster is down)
|
||||||
|
Technitium + AdGuard are in-cluster, so `.lan` resolution is gone in a cold
|
||||||
|
event. Use IPs:
|
||||||
|
|
||||||
|
| Host | IP |
|
||||||
|
|---|---|
|
||||||
|
| Proxmox host | `192.168.1.127` (also `10.0.10.1` VLAN10) |
|
||||||
|
| pfSense | `10.0.20.1` (WAN `192.168.1.2`) |
|
||||||
|
| k8s API server | `10.0.20.100` |
|
||||||
|
| Synology NAS | `192.168.1.13` |
|
||||||
|
| Edge router | `192.168.1.1` |
|
||||||
|
| Traefik LB / MetalLB | `10.0.20.200` / `10.0.20.203` |
|
||||||
|
|
||||||
|
## 5. Security analysis
|
||||||
|
|
||||||
|
- **Brute force: solved.** No password auth anywhere → password guessing is
|
||||||
|
impossible; key brute force is cryptographically infeasible.
|
||||||
|
- **Invisibility / Wave 1 intent: satisfied.** The external SSH port is
|
||||||
|
default-dropped and the knock ports are pcap-sniffed (never answered), so a
|
||||||
|
scanner sees a closed/silent host — PVE sshd is **not internet-scannable**,
|
||||||
|
honouring the spirit of "no public-IP access to PVE sshd".
|
||||||
|
- **The knock is obscurity, not cryptography.** A port-knock sequence is
|
||||||
|
plaintext and replayable by a passive on-path observer. **The SSH key is the
|
||||||
|
real access control** — the knock only removes the standing/scannable surface.
|
||||||
|
(Cryptographic SPA = fwknop, rejected for needing a client install.) Treat the
|
||||||
|
knock sequence as a secret-ish convenience, not a second cryptographic factor.
|
||||||
|
- **Residual risks** (none are brute force):
|
||||||
|
1. An sshd **0-day** exploitable during the 30 s open window → mitigation: keep
|
||||||
|
PVE patched; short `cmd_timeout`; fail2ban.
|
||||||
|
2. **Private key theft** → mitigation: key has a passphrase; revoke by removing
|
||||||
|
the line from `authorized_keys`.
|
||||||
|
3. If `.1` **SNATs** (§4.1), the 30 s window opens `:52222` for the shared
|
||||||
|
`192.168.1.1` source — anyone else arriving via `.1` in that window could
|
||||||
|
reach the sshd banner, but still needs your key. Mitigated by the short
|
||||||
|
window + key-only + fail2ban.
|
||||||
|
- **Deliberate, documented exception** to the Wave 1 "no public-IP access"
|
||||||
|
policy, scoped to this single knock-gated port. To be recorded in
|
||||||
|
`security.md` + the Wave 1 note in `infra/.claude/CLAUDE.md` on implementation.
|
||||||
|
|
||||||
|
## 6. What's automated vs manual
|
||||||
|
|
||||||
|
- **I do**: generate the keypair + knock sequence, store them in Vault, produce
|
||||||
|
the exact `sshd_config.d` snippet, `knockd.conf`, iptables rules, the client
|
||||||
|
`~/.ssh/config` + `bg()` function, and write the runbook + doc updates.
|
||||||
|
- **Manual / careful (live devices)**: the `.1` edge-router forwards are done by
|
||||||
|
you in the browser (out-of-Terraform, live device). The Proxmox host changes
|
||||||
|
(sshd, knockd, iptables, fail2ban) are applied over SSH **with key-login
|
||||||
|
verified first** to avoid lockout; pfSense is **not** touched. None of this is
|
||||||
|
a `tg apply` — pfSense and the edge router are not Terraform-managed.
|
||||||
|
|
||||||
|
## 7. Testing & verification
|
||||||
|
1. From an **external** network (phone hotspot): run `bg`; confirm knockd syslog
|
||||||
|
shows the sequence + opens `:52222`; SSH succeeds.
|
||||||
|
2. **Without** knocking: `ssh -p 52222` from external → connection refused/timed
|
||||||
|
out (port closed). A plain port scan of `52222` + the knock ports → silent.
|
||||||
|
3. LAN admin on `:22` still works (no regression); backup rsync jobs unaffected.
|
||||||
|
4. Full-LAN: `ssh -J breakglass 10.0.20.1` (pfSense) and `ssh -D 1080` SOCKS to
|
||||||
|
an internal IP.
|
||||||
|
5. Determine `.1` source-IP behaviour (verify #1) and adjust knock granularity
|
||||||
|
note accordingly.
|
||||||
|
|
||||||
|
## 8. Failure modes & rotation
|
||||||
|
- **Proxmox host down** (not just cluster): this path is gone — that's the
|
||||||
|
out-of-band tier (serial/IPMI/separate device), explicitly **out of scope**.
|
||||||
|
- **`.1` router config reset**: forwards lost → re-add from this doc; consider
|
||||||
|
exporting the `.1` config for backup.
|
||||||
|
- **Public IP change**: use a hostname endpoint (Cloudflare-resolved) so it
|
||||||
|
auto-follows; keep the raw IP as fallback.
|
||||||
|
- **Key/knock compromise**: remove the `authorized_keys` line (kills access
|
||||||
|
instantly); rotate the knock sequence in `knockd.conf` + Vault.
|
||||||
|
|
||||||
|
## 9. Out of scope
|
||||||
|
- Host-down / site-down out-of-band access (IPMI, LTE) — a future tier.
|
||||||
|
- Phone access (would need an SSH **app**, e.g. Termius — outside the
|
||||||
|
"pre-installed Linux/macOS" constraint; laptop is the target).
|
||||||
|
|
||||||
|
## 10. Docs to update on implementation
|
||||||
|
- `docs/architecture/vpn.md` — add a "Break-glass SSH" section.
|
||||||
|
- `docs/architecture/security.md` + Wave 1 note in `infra/.claude/CLAUDE.md` —
|
||||||
|
record the deliberate knock-gated exception to "no public PVE sshd".
|
||||||
|
- New runbook `docs/runbooks/breakglass-ssh.md` — connect + rotate procedure.
|
||||||
395
docs/plans/2026-05-30-breakglass-ssh-access-plan.md
Normal file
395
docs/plans/2026-05-30-breakglass-ssh-access-plan.md
Normal file
|
|
@ -0,0 +1,395 @@
|
||||||
|
# Break-Glass SSH Access — Implementation Plan
|
||||||
|
|
||||||
|
> **⚠️ SUPERSEDED 2026-06-11** by the redesign in
|
||||||
|
> `2026-06-11-breakglass-ssh-redesign-design.md` (port-knock removed). Retained
|
||||||
|
> for history. As-built: `docs/runbooks/breakglass-ssh.md`.
|
||||||
|
|
||||||
|
> **Execution model:** This plan mutates **live devices** (the Proxmox host's sshd, and the TP-Link edge router). It is **human-gated**, NOT for autonomous subagents. Each live step is applied with anti-lockout verification, and every edge-router change is made by Viktor (or by the browse tool with explicit per-change approval). Steps use `- [ ]` checkboxes.
|
||||||
|
|
||||||
|
**Goal:** Stand up a cold, brute-force-proof SSH backdoor onto the LAN — key-only SSH to the Proxmox host (`192.168.1.127`) gated behind a UDP port-knock — then decommission the legacy Synology SSH exposure and tighten UPnP.
|
||||||
|
|
||||||
|
**Architecture:** Edge router `.1` forwards a UDP knock sequence + TCP `52222` to the Proxmox host. The host runs `knockd` (libpcap) which opens `52222` for the knocker's IP for 30 s; `sshd` listens on `:22` (LAN, always) and `:52222` (external, knock-gated), key-only. Path bypasses pfSense + the k8s cluster. Client uses only stock `ssh` + `bash`.
|
||||||
|
|
||||||
|
**Tech stack:** OpenSSH, knockd, iptables, fail2ban (Debian/PVE host); TP-Link Archer AX6000 UI (edge router); HashiCorp Vault (secrets); Docker (`/home/wizard/tools/insecure-browse` for any router automation).
|
||||||
|
|
||||||
|
**Reference:** design doc `2026-05-30-breakglass-ssh-access-design.md`. Router audit (current `.1` forwards) recorded in task notes + `/home/wizard/tools/insecure-browse/out/`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pre-flight (read before starting)
|
||||||
|
|
||||||
|
- **Anti-lockout rule:** never disable password auth or reload sshd without an *already-open* root session held + a *new* session verified. Applies to every host step.
|
||||||
|
- **Live-router rule:** all `.1` changes are made by Viktor in the UI (or browse-tool with explicit approval). No blind automation of router writes.
|
||||||
|
- **Ordering rule:** the legacy Synology SSH forward (Rule 6) is **not** closed until break-glass is verified working from an external network (Phase 4 gates on Phase 4-pre verification).
|
||||||
|
- **Host access:** PVE host reached as `ssh root@192.168.1.127` from the LAN.
|
||||||
|
- **Commit gate:** the infra repo currently has unmerged conflicts + an in-progress provider/backend migration. Do NOT commit (Phase 6) until Viktor confirms the repo is clean.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 0 — Generate secrets (no live changes)
|
||||||
|
|
||||||
|
### Task 0.1: Break-glass SSH keypair
|
||||||
|
|
||||||
|
**Files:** none in repo (secrets → Vault).
|
||||||
|
|
||||||
|
- [ ] **Step 1: Generate a dedicated ed25519 keypair (with passphrase)**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkdir -p ~/.ssh
|
||||||
|
ssh-keygen -t ed25519 -a 100 -C "breakglass-$(date +%Y%m%d)" -f ~/.ssh/breakglass_ed25519
|
||||||
|
# set a passphrase when prompted (so a stolen laptop key isn't instantly usable)
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Store the private key + public key in Vault**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
vault kv patch secret/viktor \
|
||||||
|
breakglass_ssh_privkey=@$HOME/.ssh/breakglass_ed25519 \
|
||||||
|
breakglass_ssh_pubkey="$(cat ~/.ssh/breakglass_ed25519.pub)"
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 3: Verify the keys are retrievable**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
vault kv get -field=breakglass_ssh_pubkey secret/viktor
|
||||||
|
```
|
||||||
|
Expected: prints the `ssh-ed25519 AAAA... breakglass-YYYYMMDD` line.
|
||||||
|
|
||||||
|
### Task 0.2: Knock sequence
|
||||||
|
|
||||||
|
- [ ] **Step 1: Generate 3 random UDP knock ports**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
KNOCK="$(shuf -i 20000-60000 -n 3 | paste -sd, -)"; echo "$KNOCK"
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Store the sequence in Vault (keep it out of git)**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
vault kv patch secret/viktor breakglass_knock_sequence="$KNOCK"
|
||||||
|
vault kv get -field=breakglass_knock_sequence secret/viktor
|
||||||
|
```
|
||||||
|
Expected: prints three comma-separated ports, e.g. `28411,49027,33180`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 1 — Proxmox host: key-only SSH + knock gate (LIVE host change)
|
||||||
|
|
||||||
|
> Run everything in this phase **on the PVE host**. Keep your current `ssh root@192.168.1.127` session open the entire phase.
|
||||||
|
|
||||||
|
### Task 1.1: Pre-checks (no changes yet)
|
||||||
|
|
||||||
|
- [ ] **Step 1: Confirm key login already works (anti-lockout baseline)**
|
||||||
|
|
||||||
|
From your laptop, with the break-glass key authorized later — for now confirm your *existing* admin key works:
|
||||||
|
```bash
|
||||||
|
ssh -o PasswordAuthentication=no root@192.168.1.127 'echo KEY_LOGIN_OK'
|
||||||
|
```
|
||||||
|
Expected: `KEY_LOGIN_OK` (key auth works → safe to disable passwords later). If it prompts for a password, STOP and fix key auth first.
|
||||||
|
|
||||||
|
- [ ] **Step 2: Check whether the PVE firewall is active (coexistence)**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh root@192.168.1.127 'pve-firewall status 2>/dev/null; iptables -S | head'
|
||||||
|
```
|
||||||
|
Expected: note whether `Status: enabled/running`. If **enabled**, add the Phase-1.4 rules via PVE's firewall (Datacenter→Firewall) instead of raw iptables, OR disable it if unused. If **disabled** (common), proceed with the raw-iptables approach below.
|
||||||
|
|
||||||
|
### Task 1.2: Authorize the break-glass key
|
||||||
|
|
||||||
|
- [ ] **Step 1: Append the break-glass public key to root's authorized_keys**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
PUB="$(vault kv get -field=breakglass_ssh_pubkey secret/viktor)"
|
||||||
|
ssh root@192.168.1.127 "grep -qF '$PUB' /root/.ssh/authorized_keys || echo '$PUB' >> /root/.ssh/authorized_keys"
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Verify break-glass key logs in (on :22, still default)**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh -i ~/.ssh/breakglass_ed25519 -o PasswordAuthentication=no root@192.168.1.127 'echo BREAKGLASS_KEY_OK'
|
||||||
|
```
|
||||||
|
Expected: `BREAKGLASS_KEY_OK`.
|
||||||
|
|
||||||
|
### Task 1.3: sshd dual-port + key-only
|
||||||
|
|
||||||
|
**Files:** Create on host: `/etc/ssh/sshd_config.d/10-breakglass.conf`
|
||||||
|
|
||||||
|
- [ ] **Step 1: Write the sshd drop-in**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh root@192.168.1.127 'cat > /etc/ssh/sshd_config.d/10-breakglass.conf' <<'EOF'
|
||||||
|
Port 22
|
||||||
|
Port 52222
|
||||||
|
PasswordAuthentication no
|
||||||
|
KbdInteractiveAuthentication no
|
||||||
|
PubkeyAuthentication yes
|
||||||
|
PermitRootLogin prohibit-password
|
||||||
|
MaxAuthTries 3
|
||||||
|
LoginGraceTime 20
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Validate config syntax (do NOT reload yet)**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh root@192.168.1.127 'sshd -t && echo SSHD_CONFIG_OK'
|
||||||
|
```
|
||||||
|
Expected: `SSHD_CONFIG_OK`. If error, fix the drop-in before reloading.
|
||||||
|
|
||||||
|
- [ ] **Step 3: Reload sshd (current session stays alive)**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh root@192.168.1.127 'systemctl reload ssh && echo RELOADED'
|
||||||
|
```
|
||||||
|
Expected: `RELOADED`.
|
||||||
|
|
||||||
|
- [ ] **Step 4: Verify a NEW key session works on :22 AND :52222 before trusting it**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh -i ~/.ssh/breakglass_ed25519 -p 22 root@192.168.1.127 'echo OK22'
|
||||||
|
ssh -i ~/.ssh/breakglass_ed25519 -p 52222 root@192.168.1.127 'echo OK52222'
|
||||||
|
```
|
||||||
|
Expected: `OK22` and `OK52222`. (If `:52222` refuses, sshd may not have bound the second port — check `ss -tlnp | grep ssh` on the host.) Only after both succeed, the old session is safe to drop.
|
||||||
|
|
||||||
|
### Task 1.4: Base firewall (default-drop :52222, allow :22 + established)
|
||||||
|
|
||||||
|
**Files:** Create on host: `/usr/local/sbin/breakglass-firewall.sh`, `/etc/systemd/system/breakglass-firewall.service`
|
||||||
|
|
||||||
|
- [ ] **Step 1: Write the idempotent base-firewall script (dedicated chain)**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh root@192.168.1.127 'cat > /usr/local/sbin/breakglass-firewall.sh' <<'EOF'
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
set -euo pipefail
|
||||||
|
# Idempotent: (re)build a dedicated BREAKGLASS chain hooked into INPUT.
|
||||||
|
iptables -N BREAKGLASS 2>/dev/null || iptables -F BREAKGLASS
|
||||||
|
iptables -C INPUT -j BREAKGLASS 2>/dev/null || iptables -I INPUT 1 -j BREAKGLASS
|
||||||
|
# established/related always allowed
|
||||||
|
iptables -A BREAKGLASS -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
|
||||||
|
# LAN admin on :22 always allowed (.1 does NOT forward :22 to this host, so :22 is LAN-only)
|
||||||
|
iptables -A BREAKGLASS -p tcp --dport 22 -j ACCEPT
|
||||||
|
# external SSH on :52222 closed by default; knockd punches a per-source ACCEPT into INPUT pos 1
|
||||||
|
iptables -A BREAKGLASS -p tcp --dport 52222 -j DROP
|
||||||
|
EOF
|
||||||
|
ssh root@192.168.1.127 'chmod 0755 /usr/local/sbin/breakglass-firewall.sh'
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Write a boot-time systemd unit (persists across reboot, before knockd)**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh root@192.168.1.127 'cat > /etc/systemd/system/breakglass-firewall.service' <<'EOF'
|
||||||
|
[Unit]
|
||||||
|
Description=Break-glass base firewall (SSH knock gate)
|
||||||
|
After=network-pre.target
|
||||||
|
Before=knockd.service
|
||||||
|
Wants=network-pre.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=oneshot
|
||||||
|
ExecStart=/usr/local/sbin/breakglass-firewall.sh
|
||||||
|
RemainAfterExit=yes
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
EOF
|
||||||
|
ssh root@192.168.1.127 'systemctl daemon-reload && systemctl enable --now breakglass-firewall.service && echo FW_APPLIED'
|
||||||
|
```
|
||||||
|
Expected: `FW_APPLIED`.
|
||||||
|
|
||||||
|
- [ ] **Step 3: Verify LAN :22 still works and :52222 is now dropped from LAN**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh -i ~/.ssh/breakglass_ed25519 -p 22 root@192.168.1.127 'echo STILL_OK22' # works
|
||||||
|
nc -z -w3 192.168.1.127 52222 && echo "OPEN(bad)" || echo "CLOSED_AS_EXPECTED" # closed pre-knock
|
||||||
|
```
|
||||||
|
Expected: `STILL_OK22` and `CLOSED_AS_EXPECTED`.
|
||||||
|
|
||||||
|
### Task 1.5: knockd
|
||||||
|
|
||||||
|
**Files:** Create/modify on host: `/etc/knockd.conf`, `/etc/default/knockd`
|
||||||
|
|
||||||
|
- [ ] **Step 1: Install knockd (host daemon — must be native, not Docker, to manage host iptables)**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh root@192.168.1.127 'apt-get update -qq && apt-get install -y knockd && echo KNOCKD_INSTALLED'
|
||||||
|
```
|
||||||
|
Expected: `KNOCKD_INSTALLED`.
|
||||||
|
|
||||||
|
- [ ] **Step 2: Write knockd.conf with the Vault knock sequence (UDP)**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
KNOCK="$(vault kv get -field=breakglass_knock_sequence secret/viktor)" # e.g. 28411,49027,33180
|
||||||
|
read K1 K2 K3 <<<"$(echo "$KNOCK" | tr ',' ' ')"
|
||||||
|
ssh root@192.168.1.127 "cat > /etc/knockd.conf" <<EOF
|
||||||
|
[options]
|
||||||
|
UseSyslog
|
||||||
|
Interface = vmbr0
|
||||||
|
|
||||||
|
[breakglass]
|
||||||
|
sequence = ${K1}:udp,${K2}:udp,${K3}:udp
|
||||||
|
seq_timeout = 10
|
||||||
|
start_command = /usr/sbin/iptables -I INPUT 1 -s %IP% -p tcp --dport 52222 -j ACCEPT
|
||||||
|
cmd_timeout = 30
|
||||||
|
stop_command = /usr/sbin/iptables -D INPUT -s %IP% -p tcp --dport 52222 -j ACCEPT
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 3: Enable + start knockd**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh root@192.168.1.127 "sed -i 's/^START_KNOCKD=.*/START_KNOCKD=1/' /etc/default/knockd 2>/dev/null || echo 'START_KNOCKD=1' >> /etc/default/knockd"
|
||||||
|
ssh root@192.168.1.127 'systemctl enable --now knockd && systemctl is-active knockd'
|
||||||
|
```
|
||||||
|
Expected: `active`.
|
||||||
|
|
||||||
|
### Task 1.6: fail2ban (defense-in-depth)
|
||||||
|
|
||||||
|
- [ ] **Step 1: Install + enable fail2ban with the default sshd jail**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh root@192.168.1.127 'apt-get install -y fail2ban && systemctl enable --now fail2ban && fail2ban-client status sshd >/dev/null && echo F2B_OK'
|
||||||
|
```
|
||||||
|
Expected: `F2B_OK` (sshd jail active).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 2 — Edge router `.1` forwards (LIVE router change — Viktor executes)
|
||||||
|
|
||||||
|
> In the AX6000 UI: **Advanced → NAT Forwarding → Port Forwarding → Add**. Do NOT remove anything yet.
|
||||||
|
|
||||||
|
- [ ] **Step 1: Add the SSH break-glass forward**
|
||||||
|
- Name `breakglass-ssh`, External Port `52222`, Internal IP `192.168.1.127`, Internal Port `52222`, Protocol `TCP`, Enable.
|
||||||
|
|
||||||
|
- [ ] **Step 2: Add the three UDP knock forwards** (values from `vault kv get -field=breakglass_knock_sequence secret/viktor`)
|
||||||
|
- For each of the 3 ports: Name `bg-knock-N`, External Port `<port>`, Internal IP `192.168.1.127`, Internal Port `<same port>`, Protocol `UDP`, Enable.
|
||||||
|
|
||||||
|
- [ ] **Step 3: (verify #1) Determine whether `.1` preserves source IP or SNATs**
|
||||||
|
|
||||||
|
After Phase 3 connects once, on the host check the observed source:
|
||||||
|
```bash
|
||||||
|
ssh root@192.168.1.127 'journalctl -u knockd -n 20 --no-pager | grep -i "stage\|open"'
|
||||||
|
```
|
||||||
|
If `%IP%` is a public IP → source preserved (per-IP granularity). If it's `192.168.1.1` → `.1` SNATs (knock opens `:52222` for the shared `.1` source during the 30 s window). Both are acceptable with the dual-port + key-only model; just note it in the runbook.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 3 — Client config (laptop, no live infra change)
|
||||||
|
|
||||||
|
**Files:** Modify `~/.ssh/config`; add a shell function to `~/.zshrc`/`~/.bashrc`.
|
||||||
|
|
||||||
|
- [ ] **Step 1: Add the SSH host block**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cat >> ~/.ssh/config <<'EOF'
|
||||||
|
|
||||||
|
Host breakglass
|
||||||
|
HostName viktorbarzin.ddns.net
|
||||||
|
Port 52222
|
||||||
|
User root
|
||||||
|
IdentityFile ~/.ssh/breakglass_ed25519
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
(`viktorbarzin.ddns.net` is the router's NO-IP DDNS name — follows the dynamic WAN IP. Raw IP `176.12.22.76` is the fallback.)
|
||||||
|
|
||||||
|
- [ ] **Step 2: Add the knock+connect function**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cat >> ~/.zshrc <<'EOF'
|
||||||
|
|
||||||
|
bg() {
|
||||||
|
local host="viktorbarzin.ddns.net"
|
||||||
|
local seq; seq="$(vault kv get -field=breakglass_knock_sequence secret/viktor 2>/dev/null || echo "")"
|
||||||
|
[ -z "$seq" ] && { echo "no knock sequence (vault?)"; return 1; }
|
||||||
|
for p in ${seq//,/ }; do (exec 3<>/dev/udp/$host/$p) 2>/dev/null && echo "x" >&3; sleep 0.4; done
|
||||||
|
sleep 0.5
|
||||||
|
ssh breakglass "$@"
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
> Note: the bash `/dev/udp` redirection works under bash (`/bin/bash` on macOS + Linux). Under zsh, `/dev/udp` is also supported by zsh's builtin in recent versions; if your zsh build lacks it, define `bg` in bash or use `nc -u -w1 $host $p </dev/null`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 4-pre — Verify break-glass END-TO-END (gates Phase 4)
|
||||||
|
|
||||||
|
> Do this from an **external** network (phone hotspot / tethered), NOT the home LAN.
|
||||||
|
|
||||||
|
- [ ] **Step 1: Without knocking, the port is silent**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
nc -z -w3 viktorbarzin.ddns.net 52222 && echo "OPEN(bad)" || echo "SILENT_OK"
|
||||||
|
```
|
||||||
|
Expected: `SILENT_OK`.
|
||||||
|
|
||||||
|
- [ ] **Step 2: Knock + connect succeeds**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bg 'hostname; echo BREAKGLASS_E2E_OK'
|
||||||
|
```
|
||||||
|
Expected: the PVE hostname + `BREAKGLASS_E2E_OK`.
|
||||||
|
|
||||||
|
- [ ] **Step 3: Full-LAN reach via the jump (no extra install)**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh -J breakglass root@10.0.20.1 'echo PFSENSE_REACHED' 2>/dev/null || echo "check pfSense ssh"
|
||||||
|
ssh -J breakglass admin@192.168.1.13 'echo SYNOLOGY_REACHED' 2>/dev/null || echo "check synology ssh"
|
||||||
|
```
|
||||||
|
Expected: confirms you can reach pfSense + Synology *through* break-glass (so closing Rule 6 loses nothing).
|
||||||
|
|
||||||
|
- [ ] **Step 4: LAN admin unaffected**
|
||||||
|
|
||||||
|
From the home LAN: `ssh -p 22 root@192.168.1.127 'echo LAN22_OK'` → `LAN22_OK`.
|
||||||
|
|
||||||
|
**GATE:** Only proceed to Phase 4 once Steps 1–4 pass. If any fail, fix before removing the legacy forward.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 5 — Router cleanup (LIVE router change — Viktor executes, AFTER Phase 4-pre passes)
|
||||||
|
|
||||||
|
> AX6000 UI. One pass, all three changes.
|
||||||
|
|
||||||
|
- [ ] **Step 1: Remove the Synology SSH exposure (Rule 6)**
|
||||||
|
- Advanced → NAT Forwarding → Port Forwarding → delete (or disable) rule **`HTTP` / 3333 → 192.168.1.13:22**.
|
||||||
|
|
||||||
|
- [ ] **Step 2: Delete the stale Proxmox rule (Rule 3)**
|
||||||
|
- Delete the disabled rule **`proxmox` / 8006 → 192.168.1.127**.
|
||||||
|
|
||||||
|
- [ ] **Step 3: Disable UPnP**
|
||||||
|
- Advanced → NAT Forwarding → UPnP → toggle **OFF**. (Tailscale on `.101` falls back to DERP relay; the `41643→pfSense` mapping drops.)
|
||||||
|
|
||||||
|
- [ ] **Step 4: Verify the Synology SSH is gone from the WAN, break-glass still works**
|
||||||
|
|
||||||
|
From an external network:
|
||||||
|
```bash
|
||||||
|
nc -z -w3 viktorbarzin.ddns.net 3333 && echo "STILL_OPEN(bad)" || echo "SYNOLOGY_SSH_CLOSED_OK"
|
||||||
|
bg 'echo BREAKGLASS_STILL_OK'
|
||||||
|
```
|
||||||
|
Expected: `SYNOLOGY_SSH_CLOSED_OK` and `BREAKGLASS_STILL_OK`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 6 — Docs + commit (AFTER infra repo is clean)
|
||||||
|
|
||||||
|
- [ ] **Step 1: Update `docs/architecture/vpn.md`** — add a "Break-glass SSH" section (knock-gated SSH to PVE host, client `bg()`, cheat-sheet IPs).
|
||||||
|
- [ ] **Step 2: Update `docs/architecture/security.md` + the Wave-1 note in `infra/.claude/CLAUDE.md`** — record the deliberate knock-gated exception; **correct the WAN-exposure inventory** (actual `.1` forwards are qbittorrent/stun/turn→pfSense + the new break-glass; Synology SSH removed; UPnP disabled; Remote Management off).
|
||||||
|
- [ ] **Step 3: New runbook `docs/runbooks/breakglass-ssh.md`** — connect procedure, knock/key rotation, re-adding `.1` forwards after a router reset.
|
||||||
|
- [ ] **Step 4: Commit the design + plan + doc updates** (only once Viktor confirms the repo is committable):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git -C /home/wizard/code/infra add \
|
||||||
|
docs/plans/2026-05-30-breakglass-ssh-access-design.md \
|
||||||
|
docs/plans/2026-05-30-breakglass-ssh-access-plan.md \
|
||||||
|
docs/architecture/vpn.md docs/architecture/security.md \
|
||||||
|
docs/runbooks/breakglass-ssh.md .claude/CLAUDE.md
|
||||||
|
git -C /home/wizard/code/infra commit -m "docs+feat: break-glass knock-gated SSH; retire Synology SSH forward; disable UPnP [ci skip]"
|
||||||
|
git -C /home/wizard/code/infra push origin master
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Self-review
|
||||||
|
|
||||||
|
- **Spec coverage:** key-only SSH ✅ (1.3), knock gate ✅ (1.4/1.5), invisibility ✅ (4-pre.1), full-LAN via jump ✅ (4-pre.3), no-lockout ✅ (1.1/1.3.4), Wave-1 exception doc ✅ (6.2), close legacy SSH ✅ (5.1), UPnP ✅ (5.3). All design §sections map to a task.
|
||||||
|
- **Placeholder scan:** no TBDs; secret values are generated + Vault-stored, referenced via `vault kv get` (concrete, not placeholders).
|
||||||
|
- **Consistency:** port `52222`, knock from `secret/viktor/breakglass_knock_sequence`, key `~/.ssh/breakglass_ed25519`, host `192.168.1.127` used consistently throughout.
|
||||||
|
- **Open verify items** (flagged inline, non-blocking): #1 `.1` SNAT behaviour (2.3), pve-firewall coexistence (1.1.2).
|
||||||
73
docs/plans/2026-06-11-breakglass-ssh-redesign-design.md
Normal file
73
docs/plans/2026-06-11-breakglass-ssh-redesign-design.md
Normal file
|
|
@ -0,0 +1,73 @@
|
||||||
|
# Break-glass SSH — Redesign
|
||||||
|
|
||||||
|
- **Date**: 2026-06-11
|
||||||
|
- **Status**: Implemented
|
||||||
|
- **Owner**: Viktor
|
||||||
|
- **Supersedes**: `2026-05-30-breakglass-ssh-access-{design,plan}.md` (port-knock design)
|
||||||
|
- **As-built runbook**: `docs/runbooks/breakglass-ssh.md`
|
||||||
|
|
||||||
|
## Why redesign
|
||||||
|
|
||||||
|
The 2026-05-30 design gated a key-only SSH port on the Proxmox host behind a UDP
|
||||||
|
**port-knock** (knockd). It caused a real lockout, for a structural reason:
|
||||||
|
|
||||||
|
- The knock sequence was 3 random ports stored **only** in Vault, and the client
|
||||||
|
helper fetched it from Vault at connect time.
|
||||||
|
- **Vault is in-cluster** and not publicly reachable (Wave-1 policy). In the
|
||||||
|
exact scenario break-glass exists for — away from home, cluster/tunnels down —
|
||||||
|
the knock sequence is unreachable and unmemorable. Circular dependency.
|
||||||
|
|
||||||
|
The knock's only benefit was hiding an already brute-force-proof port; its cost
|
||||||
|
was that fragility. For a *recovery* path, robustness beats stealth.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
**Plain key-only SSH to the Proxmox host on `:52222`, openly reachable, no knock.**
|
||||||
|
Hardened with: the exposed port trusts only a dedicated break-glass key
|
||||||
|
(`Match LocalPort`), per-source connection rate-limiting (iptables hashlimit),
|
||||||
|
and fail2ban. Scenario covered: *cluster + tunnels down, host + pfSense + router
|
||||||
|
up* (the common "I'm away and need in" case — confirmed with Viktor; deeper
|
||||||
|
"pfSense wedged" / "host down" tiers are explicitly out of scope).
|
||||||
|
|
||||||
|
Alternatives considered and rejected: keeping the knock (fragile, circular);
|
||||||
|
Tailscale-on-pfSense (briefly chosen, then dropped — reintroduces the upstream
|
||||||
|
dependency Headscale is self-hosted to avoid, and the user preferred a
|
||||||
|
self-contained stock-ssh path); WireGuard road-warrior (needs a client, and the
|
||||||
|
self-contained SSH path was preferred).
|
||||||
|
|
||||||
|
## Components
|
||||||
|
|
||||||
|
| Layer | Change | Source of truth |
|
||||||
|
|---|---|---|
|
||||||
|
| sshd | dual-port `:22` (LAN, all keys) + `:52222` (WAN, break-glass key only via `Match LocalPort`, terminated by `Match all`); key-only everywhere | `scripts/sshd-10-breakglass.conf` |
|
||||||
|
| host firewall | `BREAKGLASS` chain: `:52222` rate-limited per source, LAN bypass; replaced the knock-gated default-DROP | `scripts/breakglass-firewall.sh` (+ `breakglass-firewall.service`) |
|
||||||
|
| fail2ban | jail fixed for Debian 13 (`journalmatch` by unit, not `_COMM=sshd`, else it never bans), bans on `:22`+`:52222` | `scripts/fail2ban-breakglass-sshd.local` |
|
||||||
|
| knockd | **removed** (package purged, config deleted) | — |
|
||||||
|
| edge router | `breakglass-ssh` WAN tcp/52222 → 192.168.1.127:52222; **removed** legacy Synology SSH forward (ext 3333 → .13:22) | manual (live device) |
|
||||||
|
| Vault | `breakglass_ssh_{pub,priv}key` retained; `breakglass_knock_sequence` now dead | `secret/viktor` |
|
||||||
|
|
||||||
|
## Edge-router constraints discovered (TP-Link AX6000)
|
||||||
|
|
||||||
|
- **No port remapping** — external port must equal internal port (rejects e.g.
|
||||||
|
`22 → 52222` as a "conflict"). All forwards are ext==int; hence `:52222` both
|
||||||
|
sides.
|
||||||
|
- **Port 22 is reserved** — `22 → 22` is also refused. Break-glass cannot use 22
|
||||||
|
(Viktor's initial preference); `:52222` is the landed port.
|
||||||
|
- **Row delete is immediate** (no confirm dialog).
|
||||||
|
|
||||||
|
## Security posture
|
||||||
|
|
||||||
|
- **Brute force: impossible** (key-only, no password).
|
||||||
|
- **Scannable: yes** — deliberate, documented Wave-1 exception (`security.md`).
|
||||||
|
- **Residual risks:** sshd 0-day during exposure (mitigate: patch, rate-limit,
|
||||||
|
fail2ban, low MaxAuthTries); break-glass key theft (revoke by removing the
|
||||||
|
`authorized_keys.breakglass` line). Logins are audited (PVE ships sshd auth +
|
||||||
|
snoopy execve to Loki).
|
||||||
|
|
||||||
|
## Verification (2026-06-11)
|
||||||
|
|
||||||
|
- `:52222` reachable; break-glass key authenticates (`root@pve`).
|
||||||
|
- Non-break-glass keys **rejected** on `:52222` (Match isolation works).
|
||||||
|
- `:22` LAN admin unaffected (Match all reset confirmed — global root login intact).
|
||||||
|
- Full WAN path: `ssh -p 52222 <WAN-IP>` with the break-glass key → `root@pve`.
|
||||||
|
- knockd gone; fail2ban jail matches Debian 13 `sshd-session` lines.
|
||||||
158
docs/runbooks/breakglass-ssh.md
Normal file
158
docs/runbooks/breakglass-ssh.md
Normal file
|
|
@ -0,0 +1,158 @@
|
||||||
|
# Runbook: Break-glass SSH
|
||||||
|
|
||||||
|
Cold-survivable, brute-force-proof SSH onto the home LAN for when the Kubernetes
|
||||||
|
cluster and its remote-access tunnels (Headscale, cloudflared) are down but the
|
||||||
|
**Proxmox host + edge router are up**. Redesigned 2026-06-11 — the previous
|
||||||
|
port-knock design is decommissioned (see "History" below).
|
||||||
|
|
||||||
|
## Model (as built)
|
||||||
|
|
||||||
|
```
|
||||||
|
your laptop (anywhere) ── ssh -p 52222 ──▶ edge router 192.168.1.1
|
||||||
|
│ WAN tcp/52222 ─▶ 192.168.1.127:52222
|
||||||
|
▼
|
||||||
|
Proxmox host 192.168.1.127
|
||||||
|
sshd :52222 (key-only, break-glass key ONLY)
|
||||||
|
→ full LAN via ssh -J / ssh -D
|
||||||
|
```
|
||||||
|
|
||||||
|
- **No port-knock.** Plain `ssh -p 52222`. The SSH key is the only gate.
|
||||||
|
- **Key-only**, brute-force-proof. The exposed `:52222` trusts **only** the
|
||||||
|
dedicated break-glass key (`/root/.ssh/authorized_keys.breakglass`), separate
|
||||||
|
from root's normal LAN-admin keys, so it is independently revocable and a leak
|
||||||
|
of any other root key does not grant internet access.
|
||||||
|
- **Rate-limited** per source IP (iptables hashlimit) + **fail2ban**. These trim
|
||||||
|
scanner noise only; key-only auth is the real protection.
|
||||||
|
- **Exposed, not hidden.** `:52222` answers on the WAN (Shodan-visible). This is
|
||||||
|
a deliberate, documented exception to the Wave-1 "no public-IP access" policy
|
||||||
|
(see `docs/architecture/security.md`), chosen for self-containment: it has **no
|
||||||
|
dependency on the cluster** (unlike Headscale/cloudflared) and nothing to
|
||||||
|
remember (unlike the old knock, whose sequence lived only in in-cluster Vault).
|
||||||
|
|
||||||
|
## Secrets (Vault `secret/viktor`)
|
||||||
|
|
||||||
|
| Key | Use |
|
||||||
|
|---|---|
|
||||||
|
| `breakglass_ssh_pubkey` | authorized on the host (`authorized_keys.breakglass`) |
|
||||||
|
| `breakglass_ssh_privkey` | the private key (also on your laptop at `~/.ssh/breakglass_ed25519`) |
|
||||||
|
|
||||||
|
The key has **no passphrase** (so it works in a true cold event without anything
|
||||||
|
to recall). Treat the private key as the sole credential — guard the laptop copy.
|
||||||
|
|
||||||
|
> Leftover: `breakglass_knock_sequence` is dead (knock decommissioned). It is
|
||||||
|
> inert; remove it when you have a Vault token with the `patch` capability
|
||||||
|
> (`vault kv patch` / merge-patch — the everyday token lacks it).
|
||||||
|
|
||||||
|
## Connect
|
||||||
|
|
||||||
|
Client `~/.ssh/config`:
|
||||||
|
|
||||||
|
```
|
||||||
|
Host breakglass
|
||||||
|
HostName viktorbarzin.ddns.net # follows the dynamic WAN IP
|
||||||
|
Port 52222
|
||||||
|
User root
|
||||||
|
IdentityFile ~/.ssh/breakglass_ed25519
|
||||||
|
IdentitiesOnly yes
|
||||||
|
```
|
||||||
|
|
||||||
|
Then:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh breakglass # shell on the Proxmox host
|
||||||
|
ssh -J breakglass root@10.0.20.1 # jump to pfSense (or any LAN host)
|
||||||
|
ssh -D 1080 breakglass # SOCKS5 → reach any internal IP
|
||||||
|
```
|
||||||
|
|
||||||
|
There is **no `bg()` knock function** anymore — delete it from your shell rc if
|
||||||
|
you added it under the old design.
|
||||||
|
|
||||||
|
## Cold-event IP cheat sheet (cluster DNS is down)
|
||||||
|
|
||||||
|
| Host | IP |
|
||||||
|
|---|---|
|
||||||
|
| Proxmox host | `192.168.1.127` |
|
||||||
|
| pfSense | `10.0.20.1` (WAN `192.168.1.2`) |
|
||||||
|
| k8s API | `10.0.20.100` |
|
||||||
|
| Synology NAS | `192.168.1.13` (reach via `ssh -J breakglass`) |
|
||||||
|
| edge router | `192.168.1.1` |
|
||||||
|
|
||||||
|
## Deploy / re-provision the host config
|
||||||
|
|
||||||
|
Source of truth lives in `infra/scripts/`. To (re)deploy:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. break-glass key authorized for the exposed port
|
||||||
|
PUB="$(vault kv get -field=breakglass_ssh_pubkey secret/viktor)"
|
||||||
|
ssh root@192.168.1.127 "printf '%s\n' '$PUB' > /root/.ssh/authorized_keys.breakglass && chmod 600 /root/.ssh/authorized_keys.breakglass"
|
||||||
|
|
||||||
|
# 2. sshd drop-in (dual-port, Match-isolated) — validate before reload (anti-lockout)
|
||||||
|
scp scripts/sshd-10-breakglass.conf root@192.168.1.127:/etc/ssh/sshd_config.d/10-breakglass.conf
|
||||||
|
ssh root@192.168.1.127 'sshd -t && systemctl reload ssh'
|
||||||
|
|
||||||
|
# 3. firewall (rate-limit) + boot unit
|
||||||
|
scp scripts/breakglass-firewall.sh root@192.168.1.127:/usr/local/sbin/breakglass-firewall.sh
|
||||||
|
ssh root@192.168.1.127 'chmod 0755 /usr/local/sbin/breakglass-firewall.sh && systemctl enable --now breakglass-firewall.service'
|
||||||
|
|
||||||
|
# 4. fail2ban jail
|
||||||
|
scp scripts/fail2ban-breakglass-sshd.local root@192.168.1.127:/etc/fail2ban/jail.d/breakglass-sshd.local
|
||||||
|
ssh root@192.168.1.127 'systemctl restart fail2ban && fail2ban-client status sshd'
|
||||||
|
```
|
||||||
|
|
||||||
|
The `breakglass-firewall.service` unit (oneshot, `RemainAfterExit=yes`,
|
||||||
|
`Before=network-online`-ish ordering) is a manual host unit — recreate it if the
|
||||||
|
host is rebuilt:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[Unit]
|
||||||
|
Description=Break-glass base firewall (key-only SSH on :52222)
|
||||||
|
After=network-pre.target
|
||||||
|
Wants=network-pre.target
|
||||||
|
[Service]
|
||||||
|
Type=oneshot
|
||||||
|
ExecStart=/usr/local/sbin/breakglass-firewall.sh
|
||||||
|
RemainAfterExit=yes
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
```
|
||||||
|
|
||||||
|
## Edge-router forward (manual — live device, not Terraform)
|
||||||
|
|
||||||
|
TP-Link Archer AX6000 (`192.168.1.1`) → Advanced → NAT Forwarding → Port
|
||||||
|
Forwarding. The break-glass rule:
|
||||||
|
|
||||||
|
| Service Name | Device IP | External Port | Internal Port | Protocol |
|
||||||
|
|---|---|---|---|---|
|
||||||
|
| `breakglass-ssh` | `192.168.1.127` | `52222` | `52222` | TCP |
|
||||||
|
|
||||||
|
**AX6000 quirks (learned 2026-06-11 — do not relearn the hard way):**
|
||||||
|
- **External port must equal internal port.** The firmware rejects any remap
|
||||||
|
(e.g. `22 → 52222`) with *"External Port: This item conflicts with existed
|
||||||
|
ones."* Hence ext==int 52222.
|
||||||
|
- **Port 22 is reserved** — even `22 → 22` is refused. Break-glass cannot use 22.
|
||||||
|
- **Row delete is immediate** (no confirm dialog) — clicking the trash icon
|
||||||
|
removes the rule and toasts "Operation succeeded".
|
||||||
|
- Automation: `~/wizard/tools/insecure-browse/add-forward.{sh,js}` (dockerized
|
||||||
|
Playwright; double-gated save `DRY_RUN=0 CONFIRM_SAVE=1`; supports
|
||||||
|
`RULES_JSON` add, `EDIT_RULES_JSON` protocol-edit, `DELETE_RULES_JSON`
|
||||||
|
identity-guarded delete). Router password: Vault
|
||||||
|
`secret/viktor/edge_router_192_168_1_1_password`.
|
||||||
|
|
||||||
|
## Rotate / revoke
|
||||||
|
|
||||||
|
- **Revoke instantly:** remove the line from `/root/.ssh/authorized_keys.breakglass`.
|
||||||
|
- **Rotate the key:** `ssh-keygen -t ed25519 -a 100 -f ~/.ssh/breakglass_ed25519`,
|
||||||
|
`vault kv patch secret/viktor breakglass_ssh_privkey=@... breakglass_ssh_pubkey=...`,
|
||||||
|
redeploy step 1 above.
|
||||||
|
- **Router reset wipes forwards:** re-add the `breakglass-ssh` rule above.
|
||||||
|
|
||||||
|
## History
|
||||||
|
|
||||||
|
- **2026-05-30:** original design — key-only SSH on `:52222` gated behind a
|
||||||
|
**UDP port-knock** (knockd). Decommissioned 2026-06-11: the knock added no real
|
||||||
|
security (the SSH key already makes the port brute-force-proof) and its only
|
||||||
|
benefit — hiding the port — came at the cost of a **circular dependency**: the
|
||||||
|
knock sequence lived only in in-cluster Vault, unreachable in the exact
|
||||||
|
cold/away scenario break-glass exists for. That caused a real lockout. The
|
||||||
|
knockd package + config + the legacy Synology SSH forward (ext 3333 → .13:22)
|
||||||
|
were removed.
|
||||||
26
scripts/breakglass-firewall.sh
Normal file
26
scripts/breakglass-firewall.sh
Normal file
|
|
@ -0,0 +1,26 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
set -euo pipefail
|
||||||
|
# Break-glass base firewall (redesigned 2026-06-11; replaced the port-knock gate).
|
||||||
|
#
|
||||||
|
# Source of truth. Deploy to the PVE host with:
|
||||||
|
# scp scripts/breakglass-firewall.sh root@192.168.1.127:/usr/local/sbin/breakglass-firewall.sh
|
||||||
|
# ssh root@192.168.1.127 'chmod 0755 /usr/local/sbin/breakglass-firewall.sh && systemctl restart breakglass-firewall.service'
|
||||||
|
# The breakglass-firewall.service oneshot runs this at boot (RemainAfterExit).
|
||||||
|
#
|
||||||
|
# Model: key-only SSH break-glass on :52222, openly reachable from the WAN, NO
|
||||||
|
# port-knock. The SSH key is the gate (brute-force-proof); the rate-limit below
|
||||||
|
# only trims scanner noise / slows a hypothetical sshd 0-day.
|
||||||
|
# :22 -> LAN admin (all of root's keys), always allowed.
|
||||||
|
# :52222 -> WAN break-glass. LAN/VLAN sources bypass the limit; external NEW
|
||||||
|
# connections are rate-limited per source IP, then accepted.
|
||||||
|
iptables -N BREAKGLASS 2>/dev/null || iptables -F BREAKGLASS
|
||||||
|
iptables -C INPUT -j BREAKGLASS 2>/dev/null || iptables -I INPUT 1 -j BREAKGLASS
|
||||||
|
|
||||||
|
iptables -A BREAKGLASS -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
|
||||||
|
iptables -A BREAKGLASS -p tcp --dport 22 -j ACCEPT
|
||||||
|
iptables -A BREAKGLASS -p tcp --dport 52222 -s 192.168.1.0/24 -j ACCEPT
|
||||||
|
iptables -A BREAKGLASS -p tcp --dport 52222 -s 10.0.0.0/8 -j ACCEPT
|
||||||
|
iptables -A BREAKGLASS -p tcp --dport 52222 -m conntrack --ctstate NEW \
|
||||||
|
-m hashlimit --hashlimit-name bg_ssh --hashlimit-mode srcip \
|
||||||
|
--hashlimit-above 6/min --hashlimit-burst 3 -j DROP
|
||||||
|
iptables -A BREAKGLASS -p tcp --dport 52222 -j ACCEPT
|
||||||
18
scripts/fail2ban-breakglass-sshd.local
Normal file
18
scripts/fail2ban-breakglass-sshd.local
Normal file
|
|
@ -0,0 +1,18 @@
|
||||||
|
# Break-glass SSH fail2ban jail (redesigned 2026-06-11). Source of truth.
|
||||||
|
# Deploy to the PVE host with:
|
||||||
|
# scp scripts/fail2ban-breakglass-sshd.local root@192.168.1.127:/etc/fail2ban/jail.d/breakglass-sshd.local
|
||||||
|
# ssh root@192.168.1.127 'systemctl restart fail2ban'
|
||||||
|
#
|
||||||
|
# GOTCHA (Debian 13 / OpenSSH 9.x): auth lines are logged under
|
||||||
|
# _COMM=sshd-session, NOT _COMM=sshd. The stock Debian jail keys journalmatch on
|
||||||
|
# `_SYSTEMD_UNIT=ssh.service + _COMM=sshd` and therefore silently NEVER bans.
|
||||||
|
# Match by unit only so both sshd and sshd-session lines are seen. Ban on both
|
||||||
|
# SSH ports (the WAN break-glass listener is :52222).
|
||||||
|
[sshd]
|
||||||
|
enabled = true
|
||||||
|
backend = systemd
|
||||||
|
journalmatch = _SYSTEMD_UNIT=ssh.service
|
||||||
|
port = ssh,52222
|
||||||
|
maxretry = 4
|
||||||
|
findtime = 10m
|
||||||
|
bantime = 1h
|
||||||
31
scripts/sshd-10-breakglass.conf
Normal file
31
scripts/sshd-10-breakglass.conf
Normal file
|
|
@ -0,0 +1,31 @@
|
||||||
|
# Break-glass SSH drop-in (redesigned 2026-06-11). Source of truth.
|
||||||
|
# Deploy to the PVE host with:
|
||||||
|
# scp scripts/sshd-10-breakglass.conf root@192.168.1.127:/etc/ssh/sshd_config.d/10-breakglass.conf
|
||||||
|
# ssh root@192.168.1.127 'sshd -t && systemctl reload ssh'
|
||||||
|
#
|
||||||
|
# :22 = LAN admin, all of root's keys (default AuthorizedKeysFile).
|
||||||
|
# :52222 = WAN-exposed break-glass. The edge router forwards WAN tcp/52222 ->
|
||||||
|
# 192.168.1.127:52222 (external port MUST equal internal port on the
|
||||||
|
# TP-Link AX6000 — it rejects remaps; port 22 itself is reserved).
|
||||||
|
# The Match LocalPort block trusts ONLY the dedicated break-glass key
|
||||||
|
# (authorized_keys.breakglass), so a leak of any other root key does
|
||||||
|
# NOT grant internet access. Rate-limited by the BREAKGLASS iptables
|
||||||
|
# chain + fail2ban. No port-knock.
|
||||||
|
#
|
||||||
|
# NOTE: the trailing `Match all` is REQUIRED. /etc/ssh/sshd_config has
|
||||||
|
# `Include sshd_config.d/*.conf` near the top but a global `PermitRootLogin`
|
||||||
|
# further down; without `Match all` resetting context, that later global
|
||||||
|
# directive would be swallowed into the `Match LocalPort 52222` condition.
|
||||||
|
Port 22
|
||||||
|
Port 52222
|
||||||
|
PasswordAuthentication no
|
||||||
|
KbdInteractiveAuthentication no
|
||||||
|
PubkeyAuthentication yes
|
||||||
|
PermitRootLogin prohibit-password
|
||||||
|
MaxAuthTries 3
|
||||||
|
LoginGraceTime 20
|
||||||
|
|
||||||
|
Match LocalPort 52222
|
||||||
|
AuthorizedKeysFile /root/.ssh/authorized_keys.breakglass
|
||||||
|
PermitRootLogin prohibit-password
|
||||||
|
Match all
|
||||||
Loading…
Add table
Add a link
Reference in a new issue