break-glass SSH: drop port-knock for exposed key-only :52222; version host config

Viktor got locked out of the break-glass path (forgot the port-knock setup) and deleted the edge-router forwards, then asked to review and redesign it from scratch. Root cause of the lockout: the knock added no real security (key-only SSH is already brute-force-proof) and its only benefit — hiding the port — came at the cost of a circular dependency. The knock sequence lived only in in-cluster Vault, which is unreachable in the exact away/cold scenario break-glass exists for. So the unlock secret was unavailable precisely when needed. New model (self-contained, nothing to remember): plain key-only SSH on the Proxmox host's :52222, openly reachable. The edge router forwards WAN tcp/52222 -> 192.168.1.127:52222 (external port MUST equal internal on the TP-Link AX6000 - it rejects remaps; port 22 itself is reserved). The exposed port trusts only a dedicated break-glass key via `Match LocalPort` (a leak of any other root key does not grant internet access), rate-limited (iptables hashlimit) + fail2ban. - Removed knockd (package + config) and the legacy Synology SSH forward (ext 3333 -> .13:22, a needless WAN exposure the original plan wanted gone). - Fixed the fail2ban jail for Debian 13 (auth logs under sshd-session, not sshd - the stock journalmatch silently never banned). - Versioned the host config in scripts/ (it was applied ad-hoc, never committed) and recorded the deliberate Wave-1 "no public-IP" exception in security.md + .claude/CLAUDE.md. Superseded the 2026-05-30 port-knock design docs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 18:23:39 +00:00 · 2026-06-11 18:23:39 +00:00 · df332b59e6
commit df332b59e6
parent e2788d1b2d
9 changed files with 989 additions and 1 deletions
--- a/docs/architecture/security.md
+++ b/docs/architecture/security.md
@ -255,6 +255,8 @@ Routed via **Loki ruler → Alertmanager → `#security` Slack receiver**. Same

 **Policy: no public-IP access ever.** Vault, kube-apiserver, PVE sshd must transit a trusted LAN or Headscale. Anything else fires an alert.

+**Documented exception — break-glass SSH (2026-06-11):** one deliberate carve-out. The Proxmox host's sshd listens on a WAN-exposed `:52222` (edge-router forward), **key-only**, trusting only a dedicated break-glass key (`Match LocalPort` → `authorized_keys.breakglass`), rate-limited (iptables hashlimit) + fail2ban. It is intentionally reachable from the public internet so it survives a cluster/tunnel outage with no dependency on the cluster — the one case the "must transit LAN/Headscale" rule cannot serve. Brute-force-proof (no password); the trade is Shodan-visibility. As-built: `docs/runbooks/breakglass-ssh.md`; rationale: `docs/plans/2026-06-11-breakglass-ssh-redesign-design.md`. (Replaced the 2026-05-30 port-knock variant, which was non-scannable but had a circular Vault dependency that caused a lockout.)
+
 #### Why no canary tokens

 Original plan included canary tokens (fake K8s Secret, Vault KV path, PVE file, sinkhole hostname). Rejected because Viktor routinely greps `secret/viktor` (135 keys) and lists `kubectl get secret -A` — any read-trigger canary self-fires. Use-based canaries (zero-RBAC SA tokens with audit alerts on use) were also considered but rejected in favor of cleaner source-IP anomaly detection (K9, V7) on REAL tokens — same threat model, no fake-token operational burden.
--- a/docs/plans/2026-05-30-breakglass-ssh-access-design.md
+++ b/docs/plans/2026-05-30-breakglass-ssh-access-design.md
@ -0,0 +1,285 @@
+# Break-Glass SSH Access — Design
+
+> **⚠️ SUPERSEDED 2026-06-11** by `2026-06-11-breakglass-ssh-redesign-design.md`.
+> The port-knock was removed: it added no real security (the SSH key already
+> makes the port brute-force-proof) and its knock sequence lived only in
+> in-cluster Vault — unreachable in the exact cold/away scenario break-glass
+> exists for, which caused a real lockout. Retained for history. As-built:
+> `docs/runbooks/breakglass-ssh.md`.
+
+- **Date**: 2026-05-30
+- **Status**: Draft — pending user review
+- **Owner**: Viktor
+- **Related**: `docs/architecture/vpn.md`, `docs/architecture/security.md`, `infra/.claude/CLAUDE.md` (Security Posture Wave 1)
+
+## 1. Goal
+
+Provide a **cold, brute-force-proof backdoor onto the home LAN from the public
+internet** for the case where the Kubernetes cluster and every cluster-hosted
+remote-access path are down (cloudflared, Headscale/Tailscale, in-cluster
+WireGuard), but the **Proxmox host, pfSense, and the edge router are still up**.
+
+### Hard requirements (from the user)
+
+1. **Cold-survivable**: must work when the k8s cluster + all its tunnels are
+   down. The path must touch **nothing in the cluster** (no Authentik, Traefik,
+   Technitium/AdGuard DNS, cloudflared).
+2. **Full LAN access** once connected (SSH to Proxmox host, pfSense, Synology,
+   k8s API, etc.).
+3. **No brute force**: no password-guessable surface.
+4. **Client uses only software pre-installed on Linux/macOS** — no WireGuard /
+   Tailscale / fwknop client install. Stock `ssh` (+ `bash`) only.
+5. **Minimal effort**, and ideally **honor the locked Wave 1 policy**
+   (`no public-IP access — … PVE sshd must transit LAN or Headscale`).
+
+## 2. Decision
+
+**Key-only SSH to the Proxmox host, gated behind a UDP port-knock.**
+
+- The Proxmox host (`192.168.1.127`) is the entry point — it's the recovery box
+  (`virsh`/`qm` to reboot the pfSense VM, `kubectl`, full hypervisor control)
+  and it sits directly on the `192.168.1.0/24` segment, so the path **does not
+  traverse pfSense or the cluster** — it survives a wedged pfSense too, not just
+  a down cluster.
+- SSH is the only externally-usable remote tool **pre-installed on every
+  Linux/macOS box**, satisfying requirement 4.
+- **Key-only auth** (no passwords anywhere) makes password brute force
+  impossible → requirement 3.
+- A **port-knock** keeps the external SSH port **closed/invisible to scanners**
+  until a knock sequence is sent. This restores the "no standing public service"
+  property we'd have had with WireGuard and keeps us within the **intent** of the
+  Wave 1 policy (PVE sshd is not internet-scannable). The knock is sent with a
+  **bash `/dev/udp` one-liner** — zero install.
+
+### Alternatives rejected
+
+| Option | Why rejected |
+|---|---|
+| WireGuard road-warrior on pfSense | Needs a WireGuard **client app** (fails requirement 4). Was the prior design. |
+| Tailscale / Headscale | Client app + control plane is in-cluster (dies cold). |
+| Browser → web admin UI (Proxmox/pfSense/Synology) | "Pre-installed" (browser) but password-based → brute-forceable, far larger attack surface than a key-only SSH port. |
+| Plain **exposed** key-only SSH (no knock) | Brute-force-proof, but a **publicly visible** service (Shodan-catalogued) and a standing violation of the Wave 1 "no public PVE sshd" policy. The knock removes the standing exposure for ~15 min more setup. |
+| fwknop / cryptographic SPA | Strongest hiding, but needs a **client install** (fails requirement 4). |
+
+## 3. Architecture
+
+```
+  Your laptop (anywhere) — stock ssh + bash, nothing installed
+     │  (1) UDP knock sequence  →  bash: echo > /dev/udp/<pub>/<port>   (instant, no handshake)
+     │  (2) ssh -p 52222 root@<pub>
+     ▼
+  Edge router 192.168.1.1   (the box the stored password unlocks)
+     │  forwards:  UDP <k1>,<k2>,<k3>  +  TCP 52222   →   192.168.1.127
+     ▼
+  Proxmox host 192.168.1.127   ← path bypasses pfSense entirely
+     ├─ knockd (libpcap) sees the UDP knock → opens TCP 52222 for your source IP (30 s)
+     ├─ sshd listens on :22 (LAN admin, always) AND :52222 (external, knock-gated), key-only
+     └─ once in:  virsh/qm (reboot pfSense VM), kubectl, ssh -J / ssh -D → full LAN
+```
+
+**Why it meets "cold + full LAN":** the host is up by definition of the chosen
+failure mode; nothing in the path depends on k8s, pfSense, or DNS. From the host
+you reach the whole LAN either directly (it's on `192.168.1.0/24` and routes to
+the VLANs via pfSense when pfSense is up) or by using SSH's built-in
+`-J`/`-D` — both stock, no install.
+
+## 4. Components
+
+### 4.1 Edge router @ 192.168.1.1 (manual, in the browser)
+Add port-forwards (same place the existing `51821` WireGuard forward lives):
+- **TCP 52222 → 192.168.1.127:52222** (external SSH; no port rewrite — see §4.3 rationale)
+- **UDP `<k1>`, `<k2>`, `<k3>` → 192.168.1.127** (knock ports; actual numbers in Vault)
+
+If the router supports a **port range** forward, a single range covering the
+knock ports + 52222 is tidier than four rules.
+
+> **Verify (#1 implementation check):** whether `.1` **preserves the source IP**
+> on forwarded packets (typical DNAT) or **SNATs** them to `192.168.1.1`. Test by
+> knocking + connecting from an external network and checking `/var/log/auth.log`
+> + `knockd` syslog for the observed source IP. The design works either way (see
+> §4.3), but it determines knock granularity.
+
+### 4.2 SSH keys & Vault layout
+- Mint a **dedicated** break-glass keypair (ed25519), separate from
+  `secret/viktor/proxmox_ssh_key`, so it's independently revocable and clearly
+  labelled.
+- **Public key** → `/root/.ssh/authorized_keys` on the Proxmox host (no `from=`
+  restriction — break-glass is from-anywhere; the knock + key are the gate).
+- **Private key** → Vault `secret/viktor/breakglass_ssh_privkey` (for
+  re-provisioning) **and** on your laptop at `~/.ssh/breakglass_ed25519`
+  (chmod 600).
+- **Knock sequence** → Vault `secret/viktor/breakglass_knock_sequence` (kept out
+  of git — obscurity value only; see §5).
+
+### 4.3 Proxmox host — sshd hardening
+`/etc/ssh/sshd_config.d/10-breakglass.conf`:
+```
+Port 22
+Port 52222
+PasswordAuthentication no
+KbdInteractiveAuthentication no
+PubkeyAuthentication yes
+PermitRootLogin prohibit-password     # key-only root (PVE recovery norm)
+MaxAuthTries 3
+LoginGraceTime 20
+```
+- sshd listens on **:22 (LAN admin, always allowed)** and **:52222 (external,
+  knock-gated)**. Using a dedicated external port (not a DNAT rewrite to 22)
+  lets the firewall distinguish LAN vs external **regardless of `.1` SNAT
+  behaviour** (§4.1) — LAN admin on `:22` is never affected by the gate.
+- **Default to root key-only** for recovery practicality. *Alternative for
+  review:* a dedicated `breakglass` sudo user instead of root.
+
+> **Verify (#2):** key login already works for your normal access **before**
+> `PasswordAuthentication no` is committed — no lockout. (Backup rsync jobs
+> already use keys, so this is likely already effectively true.)
+
+### 4.4 Host firewall (knock gate)
+Default-drop the external SSH port; knockd punches a per-source hole. LAN admin
+(`:22`) and established sessions are untouched:
+```
+# allow established / related
+iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
+# LAN admin + backups: SSH on :22 always allowed
+iptables -A INPUT -p tcp --dport 22 -j ACCEPT
+# external SSH on :52222 closed by default — knockd opens it per-source
+iptables -A INPUT -p tcp --dport 52222 -j DROP
+```
+- **knockd uses libpcap**, so it sees the UDP knock packets even though iptables
+  drops them — the knock ports stay **silent/closed** to scanners.
+- **pve-firewall coexistence (verify #3):** confirm whether the PVE firewall is
+  enabled. If it is, express these rules through it (or a dedicated chain) so a
+  pve-firewall reload doesn't wipe the knockd-managed rule. Default PVE installs
+  often have it off at datacenter level.
+
+### 4.5 knockd
+`apt install knockd` (Debian/PVE). `/etc/knockd.conf`:
+```
+[options]
+    UseSyslog
+    Interface = vmbr0          # the 192.168.1.127 interface
+
+[breakglass]
+    sequence      = <k1>:udp,<k2>:udp,<k3>:udp     # real ports from Vault
+    seq_timeout   = 10
+    start_command = /usr/sbin/iptables -I INPUT 1 -s %IP% -p tcp --dport 52222 -j ACCEPT
+    cmd_timeout   = 30
+    stop_command  = /usr/sbin/iptables -D INPUT -s %IP% -p tcp --dport 52222 -j ACCEPT
+```
+- **UDP knock** → the client knock is fire-and-forget (`/dev/udp`), no TCP-hang
+  on the client (a TCP knock to a dropped port would block until timeout).
+- Opens `:52222` for the knocker's source IP for **30 s**; an SSH session
+  established within that window **persists** via conntrack ESTABLISHED after the
+  rule is removed. Enable + start the `knockd` service.
+
+### 4.6 fail2ban (defense-in-depth)
+`apt install fail2ban`, sshd jail (watches `auth.log`, bans repeat failures).
+Local to the host, **no cluster dependency**. Catches anything that gets past the
+knock to the sshd listener.
+
+### 4.7 Client side (laptop — stock tools only)
+`~/.ssh/config`:
+```
+Host breakglass
+    HostName <public-ip-or-dyndns>
+    Port 52222
+    User root
+    IdentityFile ~/.ssh/breakglass_ed25519
+```
+Knock + connect — a shell function using **bash builtins only** (works on
+macOS `/bin/bash` + Linux; UDP send is instant):
+```sh
+bg() {
+  local host=<public-ip-or-dyndns>
+  for p in <k1> <k2> <k3>; do echo -n x > "/dev/udp/$host/$p"; sleep 0.4; done
+  sleep 0.5
+  ssh breakglass "$@"
+}
+```
+- **Full LAN, no install:** `ssh -J breakglass <internal-host>` (jump), or
+  `ssh -D 1080 breakglass` then point a browser/`curl` at SOCKS5 `127.0.0.1:1080`
+  to reach any internal IP. From the host shell you already have everything.
+- *Optional fully-transparent variant:* fold the knock into a `ProxyCommand` in
+  the `Host breakglass` block so plain `ssh breakglass` knocks automatically.
+
+### 4.8 Cold-scenario IP cheat sheet (DNS is down when the cluster is down)
+Technitium + AdGuard are in-cluster, so `.lan` resolution is gone in a cold
+event. Use IPs:
+
+| Host | IP |
+|---|---|
+| Proxmox host | `192.168.1.127` (also `10.0.10.1` VLAN10) |
+| pfSense | `10.0.20.1` (WAN `192.168.1.2`) |
+| k8s API server | `10.0.20.100` |
+| Synology NAS | `192.168.1.13` |
+| Edge router | `192.168.1.1` |
+| Traefik LB / MetalLB | `10.0.20.200` / `10.0.20.203` |
+
+## 5. Security analysis
+
+- **Brute force: solved.** No password auth anywhere → password guessing is
+  impossible; key brute force is cryptographically infeasible.
+- **Invisibility / Wave 1 intent: satisfied.** The external SSH port is
+  default-dropped and the knock ports are pcap-sniffed (never answered), so a
+  scanner sees a closed/silent host — PVE sshd is **not internet-scannable**,
+  honouring the spirit of "no public-IP access to PVE sshd".
+- **The knock is obscurity, not cryptography.** A port-knock sequence is
+  plaintext and replayable by a passive on-path observer. **The SSH key is the
+  real access control** — the knock only removes the standing/scannable surface.
+  (Cryptographic SPA = fwknop, rejected for needing a client install.) Treat the
+  knock sequence as a secret-ish convenience, not a second cryptographic factor.
+- **Residual risks** (none are brute force):
+  1. An sshd **0-day** exploitable during the 30 s open window → mitigation: keep
+     PVE patched; short `cmd_timeout`; fail2ban.
+  2. **Private key theft** → mitigation: key has a passphrase; revoke by removing
+     the line from `authorized_keys`.
+  3. If `.1` **SNATs** (§4.1), the 30 s window opens `:52222` for the shared
+     `192.168.1.1` source — anyone else arriving via `.1` in that window could
+     reach the sshd banner, but still needs your key. Mitigated by the short
+     window + key-only + fail2ban.
+- **Deliberate, documented exception** to the Wave 1 "no public-IP access"
+  policy, scoped to this single knock-gated port. To be recorded in
+  `security.md` + the Wave 1 note in `infra/.claude/CLAUDE.md` on implementation.
+
+## 6. What's automated vs manual
+
+- **I do**: generate the keypair + knock sequence, store them in Vault, produce
+  the exact `sshd_config.d` snippet, `knockd.conf`, iptables rules, the client
+  `~/.ssh/config` + `bg()` function, and write the runbook + doc updates.
+- **Manual / careful (live devices)**: the `.1` edge-router forwards are done by
+  you in the browser (out-of-Terraform, live device). The Proxmox host changes
+  (sshd, knockd, iptables, fail2ban) are applied over SSH **with key-login
+  verified first** to avoid lockout; pfSense is **not** touched. None of this is
+  a `tg apply` — pfSense and the edge router are not Terraform-managed.
+
+## 7. Testing & verification
+1. From an **external** network (phone hotspot): run `bg`; confirm knockd syslog
+   shows the sequence + opens `:52222`; SSH succeeds.
+2. **Without** knocking: `ssh -p 52222` from external → connection refused/timed
+   out (port closed). A plain port scan of `52222` + the knock ports → silent.
+3. LAN admin on `:22` still works (no regression); backup rsync jobs unaffected.
+4. Full-LAN: `ssh -J breakglass 10.0.20.1` (pfSense) and `ssh -D 1080` SOCKS to
+   an internal IP.
+5. Determine `.1` source-IP behaviour (verify #1) and adjust knock granularity
+   note accordingly.
+
+## 8. Failure modes & rotation
+- **Proxmox host down** (not just cluster): this path is gone — that's the
+  out-of-band tier (serial/IPMI/separate device), explicitly **out of scope**.
+- **`.1` router config reset**: forwards lost → re-add from this doc; consider
+  exporting the `.1` config for backup.
+- **Public IP change**: use a hostname endpoint (Cloudflare-resolved) so it
+  auto-follows; keep the raw IP as fallback.
+- **Key/knock compromise**: remove the `authorized_keys` line (kills access
+  instantly); rotate the knock sequence in `knockd.conf` + Vault.
+
+## 9. Out of scope
+- Host-down / site-down out-of-band access (IPMI, LTE) — a future tier.
+- Phone access (would need an SSH **app**, e.g. Termius — outside the
+  "pre-installed Linux/macOS" constraint; laptop is the target).
+
+## 10. Docs to update on implementation
+- `docs/architecture/vpn.md` — add a "Break-glass SSH" section.
+- `docs/architecture/security.md` + Wave 1 note in `infra/.claude/CLAUDE.md` —
+  record the deliberate knock-gated exception to "no public PVE sshd".
+- New runbook `docs/runbooks/breakglass-ssh.md` — connect + rotate procedure.
--- a/docs/plans/2026-05-30-breakglass-ssh-access-plan.md
+++ b/docs/plans/2026-05-30-breakglass-ssh-access-plan.md
@ -0,0 +1,395 @@
+# Break-Glass SSH Access — Implementation Plan
+
+> **⚠️ SUPERSEDED 2026-06-11** by the redesign in
+> `2026-06-11-breakglass-ssh-redesign-design.md` (port-knock removed). Retained
+> for history. As-built: `docs/runbooks/breakglass-ssh.md`.
+
+> **Execution model:** This plan mutates **live devices** (the Proxmox host's sshd, and the TP-Link edge router). It is **human-gated**, NOT for autonomous subagents. Each live step is applied with anti-lockout verification, and every edge-router change is made by Viktor (or by the browse tool with explicit per-change approval). Steps use `- [ ]` checkboxes.
+
+**Goal:** Stand up a cold, brute-force-proof SSH backdoor onto the LAN — key-only SSH to the Proxmox host (`192.168.1.127`) gated behind a UDP port-knock — then decommission the legacy Synology SSH exposure and tighten UPnP.
+
+**Architecture:** Edge router `.1` forwards a UDP knock sequence + TCP `52222` to the Proxmox host. The host runs `knockd` (libpcap) which opens `52222` for the knocker's IP for 30 s; `sshd` listens on `:22` (LAN, always) and `:52222` (external, knock-gated), key-only. Path bypasses pfSense + the k8s cluster. Client uses only stock `ssh` + `bash`.
+
+**Tech stack:** OpenSSH, knockd, iptables, fail2ban (Debian/PVE host); TP-Link Archer AX6000 UI (edge router); HashiCorp Vault (secrets); Docker (`/home/wizard/tools/insecure-browse` for any router automation).
+
+**Reference:** design doc `2026-05-30-breakglass-ssh-access-design.md`. Router audit (current `.1` forwards) recorded in task notes + `/home/wizard/tools/insecure-browse/out/`.
+
+---
+
+## Pre-flight (read before starting)
+
+- **Anti-lockout rule:** never disable password auth or reload sshd without an *already-open* root session held + a *new* session verified. Applies to every host step.
+- **Live-router rule:** all `.1` changes are made by Viktor in the UI (or browse-tool with explicit approval). No blind automation of router writes.
+- **Ordering rule:** the legacy Synology SSH forward (Rule 6) is **not** closed until break-glass is verified working from an external network (Phase 4 gates on Phase 4-pre verification).
+- **Host access:** PVE host reached as `ssh root@192.168.1.127` from the LAN.
+- **Commit gate:** the infra repo currently has unmerged conflicts + an in-progress provider/backend migration. Do NOT commit (Phase 6) until Viktor confirms the repo is clean.
+
+---
+
+## Phase 0 — Generate secrets (no live changes)
+
+### Task 0.1: Break-glass SSH keypair
+
+**Files:** none in repo (secrets → Vault).
+
+- [ ] **Step 1: Generate a dedicated ed25519 keypair (with passphrase)**
+
+```bash
+mkdir -p ~/.ssh
+ssh-keygen -t ed25519 -a 100 -C "breakglass-$(date +%Y%m%d)" -f ~/.ssh/breakglass_ed25519
+# set a passphrase when prompted (so a stolen laptop key isn't instantly usable)
+```
+
+- [ ] **Step 2: Store the private key + public key in Vault**
+
+```bash
+vault kv patch secret/viktor \
+  breakglass_ssh_privkey=@$HOME/.ssh/breakglass_ed25519 \
+  breakglass_ssh_pubkey="$(cat ~/.ssh/breakglass_ed25519.pub)"
+```
+
+- [ ] **Step 3: Verify the keys are retrievable**
+
+```bash
+vault kv get -field=breakglass_ssh_pubkey secret/viktor
+```
+Expected: prints the `ssh-ed25519 AAAA... breakglass-YYYYMMDD` line.
+
+### Task 0.2: Knock sequence
+
+- [ ] **Step 1: Generate 3 random UDP knock ports**
+
+```bash
+KNOCK="$(shuf -i 20000-60000 -n 3 | paste -sd, -)"; echo "$KNOCK"
+```
+
+- [ ] **Step 2: Store the sequence in Vault (keep it out of git)**
+
+```bash
+vault kv patch secret/viktor breakglass_knock_sequence="$KNOCK"
+vault kv get -field=breakglass_knock_sequence secret/viktor
+```
+Expected: prints three comma-separated ports, e.g. `28411,49027,33180`.
+
+---
+
+## Phase 1 — Proxmox host: key-only SSH + knock gate (LIVE host change)
+
+> Run everything in this phase **on the PVE host**. Keep your current `ssh root@192.168.1.127` session open the entire phase.
+
+### Task 1.1: Pre-checks (no changes yet)
+
+- [ ] **Step 1: Confirm key login already works (anti-lockout baseline)**
+
+From your laptop, with the break-glass key authorized later — for now confirm your *existing* admin key works:
+```bash
+ssh -o PasswordAuthentication=no root@192.168.1.127 'echo KEY_LOGIN_OK'
+```
+Expected: `KEY_LOGIN_OK` (key auth works → safe to disable passwords later). If it prompts for a password, STOP and fix key auth first.
+
+- [ ] **Step 2: Check whether the PVE firewall is active (coexistence)**
+
+```bash
+ssh root@192.168.1.127 'pve-firewall status 2>/dev/null; iptables -S | head'
+```
+Expected: note whether `Status: enabled/running`. If **enabled**, add the Phase-1.4 rules via PVE's firewall (Datacenter→Firewall) instead of raw iptables, OR disable it if unused. If **disabled** (common), proceed with the raw-iptables approach below.
+
+### Task 1.2: Authorize the break-glass key
+
+- [ ] **Step 1: Append the break-glass public key to root's authorized_keys**
+
+```bash
+PUB="$(vault kv get -field=breakglass_ssh_pubkey secret/viktor)"
+ssh root@192.168.1.127 "grep -qF '$PUB' /root/.ssh/authorized_keys || echo '$PUB' >> /root/.ssh/authorized_keys"
+```
+
+- [ ] **Step 2: Verify break-glass key logs in (on :22, still default)**
+
+```bash
+ssh -i ~/.ssh/breakglass_ed25519 -o PasswordAuthentication=no root@192.168.1.127 'echo BREAKGLASS_KEY_OK'
+```
+Expected: `BREAKGLASS_KEY_OK`.
+
+### Task 1.3: sshd dual-port + key-only
+
+**Files:** Create on host: `/etc/ssh/sshd_config.d/10-breakglass.conf`
+
+- [ ] **Step 1: Write the sshd drop-in**
+
+```bash
+ssh root@192.168.1.127 'cat > /etc/ssh/sshd_config.d/10-breakglass.conf' <<'EOF'
+Port 22
+Port 52222
+PasswordAuthentication no
+KbdInteractiveAuthentication no
+PubkeyAuthentication yes
+PermitRootLogin prohibit-password
+MaxAuthTries 3
+LoginGraceTime 20
+EOF
+```
+
+- [ ] **Step 2: Validate config syntax (do NOT reload yet)**
+
+```bash
+ssh root@192.168.1.127 'sshd -t && echo SSHD_CONFIG_OK'
+```
+Expected: `SSHD_CONFIG_OK`. If error, fix the drop-in before reloading.
+
+- [ ] **Step 3: Reload sshd (current session stays alive)**
+
+```bash
+ssh root@192.168.1.127 'systemctl reload ssh && echo RELOADED'
+```
+Expected: `RELOADED`.
+
+- [ ] **Step 4: Verify a NEW key session works on :22 AND :52222 before trusting it**
+
+```bash
+ssh -i ~/.ssh/breakglass_ed25519 -p 22    root@192.168.1.127 'echo OK22'
+ssh -i ~/.ssh/breakglass_ed25519 -p 52222 root@192.168.1.127 'echo OK52222'
+```
+Expected: `OK22` and `OK52222`. (If `:52222` refuses, sshd may not have bound the second port — check `ss -tlnp | grep ssh` on the host.) Only after both succeed, the old session is safe to drop.
+
+### Task 1.4: Base firewall (default-drop :52222, allow :22 + established)
+
+**Files:** Create on host: `/usr/local/sbin/breakglass-firewall.sh`, `/etc/systemd/system/breakglass-firewall.service`
+
+- [ ] **Step 1: Write the idempotent base-firewall script (dedicated chain)**
+
+```bash
+ssh root@192.168.1.127 'cat > /usr/local/sbin/breakglass-firewall.sh' <<'EOF'
+#!/usr/bin/env bash
+set -euo pipefail
+# Idempotent: (re)build a dedicated BREAKGLASS chain hooked into INPUT.
+iptables -N BREAKGLASS 2>/dev/null || iptables -F BREAKGLASS
+iptables -C INPUT -j BREAKGLASS 2>/dev/null || iptables -I INPUT 1 -j BREAKGLASS
+# established/related always allowed
+iptables -A BREAKGLASS -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
+# LAN admin on :22 always allowed (.1 does NOT forward :22 to this host, so :22 is LAN-only)
+iptables -A BREAKGLASS -p tcp --dport 22 -j ACCEPT
+# external SSH on :52222 closed by default; knockd punches a per-source ACCEPT into INPUT pos 1
+iptables -A BREAKGLASS -p tcp --dport 52222 -j DROP
+EOF
+ssh root@192.168.1.127 'chmod 0755 /usr/local/sbin/breakglass-firewall.sh'
+```
+
+- [ ] **Step 2: Write a boot-time systemd unit (persists across reboot, before knockd)**
+
+```bash
+ssh root@192.168.1.127 'cat > /etc/systemd/system/breakglass-firewall.service' <<'EOF'
+[Unit]
+Description=Break-glass base firewall (SSH knock gate)
+After=network-pre.target
+Before=knockd.service
+Wants=network-pre.target
+
+[Service]
+Type=oneshot
+ExecStart=/usr/local/sbin/breakglass-firewall.sh
+RemainAfterExit=yes
+
+[Install]
+WantedBy=multi-user.target
+EOF
+ssh root@192.168.1.127 'systemctl daemon-reload && systemctl enable --now breakglass-firewall.service && echo FW_APPLIED'
+```
+Expected: `FW_APPLIED`.
+
+- [ ] **Step 3: Verify LAN :22 still works and :52222 is now dropped from LAN**
+
+```bash
+ssh -i ~/.ssh/breakglass_ed25519 -p 22 root@192.168.1.127 'echo STILL_OK22'         # works
+nc -z -w3 192.168.1.127 52222 && echo "OPEN(bad)" || echo "CLOSED_AS_EXPECTED"      # closed pre-knock
+```
+Expected: `STILL_OK22` and `CLOSED_AS_EXPECTED`.
+
+### Task 1.5: knockd
+
+**Files:** Create/modify on host: `/etc/knockd.conf`, `/etc/default/knockd`
+
+- [ ] **Step 1: Install knockd (host daemon — must be native, not Docker, to manage host iptables)**
+
+```bash
+ssh root@192.168.1.127 'apt-get update -qq && apt-get install -y knockd && echo KNOCKD_INSTALLED'
+```
+Expected: `KNOCKD_INSTALLED`.
+
+- [ ] **Step 2: Write knockd.conf with the Vault knock sequence (UDP)**
+
+```bash
+KNOCK="$(vault kv get -field=breakglass_knock_sequence secret/viktor)"   # e.g. 28411,49027,33180
+read K1 K2 K3 <<<"$(echo "$KNOCK" | tr ',' ' ')"
+ssh root@192.168.1.127 "cat > /etc/knockd.conf" <<EOF
+[options]
+    UseSyslog
+    Interface = vmbr0
+
+[breakglass]
+    sequence      = ${K1}:udp,${K2}:udp,${K3}:udp
+    seq_timeout   = 10
+    start_command = /usr/sbin/iptables -I INPUT 1 -s %IP% -p tcp --dport 52222 -j ACCEPT
+    cmd_timeout   = 30
+    stop_command  = /usr/sbin/iptables -D INPUT -s %IP% -p tcp --dport 52222 -j ACCEPT
+EOF
+```
+
+- [ ] **Step 3: Enable + start knockd**
+
+```bash
+ssh root@192.168.1.127 "sed -i 's/^START_KNOCKD=.*/START_KNOCKD=1/' /etc/default/knockd 2>/dev/null || echo 'START_KNOCKD=1' >> /etc/default/knockd"
+ssh root@192.168.1.127 'systemctl enable --now knockd && systemctl is-active knockd'
+```
+Expected: `active`.
+
+### Task 1.6: fail2ban (defense-in-depth)
+
+- [ ] **Step 1: Install + enable fail2ban with the default sshd jail**
+
+```bash
+ssh root@192.168.1.127 'apt-get install -y fail2ban && systemctl enable --now fail2ban && fail2ban-client status sshd >/dev/null && echo F2B_OK'
+```
+Expected: `F2B_OK` (sshd jail active).
+
+---
+
+## Phase 2 — Edge router `.1` forwards (LIVE router change — Viktor executes)
+
+> In the AX6000 UI: **Advanced → NAT Forwarding → Port Forwarding → Add**. Do NOT remove anything yet.
+
+- [ ] **Step 1: Add the SSH break-glass forward**
+  - Name `breakglass-ssh`, External Port `52222`, Internal IP `192.168.1.127`, Internal Port `52222`, Protocol `TCP`, Enable.
+
+- [ ] **Step 2: Add the three UDP knock forwards** (values from `vault kv get -field=breakglass_knock_sequence secret/viktor`)
+  - For each of the 3 ports: Name `bg-knock-N`, External Port `<port>`, Internal IP `192.168.1.127`, Internal Port `<same port>`, Protocol `UDP`, Enable.
+
+- [ ] **Step 3: (verify #1) Determine whether `.1` preserves source IP or SNATs**
+
+After Phase 3 connects once, on the host check the observed source:
+```bash
+ssh root@192.168.1.127 'journalctl -u knockd -n 20 --no-pager | grep -i "stage\|open"'
+```
+If `%IP%` is a public IP → source preserved (per-IP granularity). If it's `192.168.1.1` → `.1` SNATs (knock opens `:52222` for the shared `.1` source during the 30 s window). Both are acceptable with the dual-port + key-only model; just note it in the runbook.
+
+---
+
+## Phase 3 — Client config (laptop, no live infra change)
+
+**Files:** Modify `~/.ssh/config`; add a shell function to `~/.zshrc`/`~/.bashrc`.
+
+- [ ] **Step 1: Add the SSH host block**
+
+```bash
+cat >> ~/.ssh/config <<'EOF'
+
+Host breakglass
+    HostName viktorbarzin.ddns.net
+    Port 52222
+    User root
+    IdentityFile ~/.ssh/breakglass_ed25519
+EOF
+```
+(`viktorbarzin.ddns.net` is the router's NO-IP DDNS name — follows the dynamic WAN IP. Raw IP `176.12.22.76` is the fallback.)
+
+- [ ] **Step 2: Add the knock+connect function**
+
+```bash
+cat >> ~/.zshrc <<'EOF'
+
+bg() {
+  local host="viktorbarzin.ddns.net"
+  local seq; seq="$(vault kv get -field=breakglass_knock_sequence secret/viktor 2>/dev/null || echo "")"
+  [ -z "$seq" ] && { echo "no knock sequence (vault?)"; return 1; }
+  for p in ${seq//,/ }; do (exec 3<>/dev/udp/$host/$p) 2>/dev/null && echo "x" >&3; sleep 0.4; done
+  sleep 0.5
+  ssh breakglass "$@"
+}
+EOF
+```
+> Note: the bash `/dev/udp` redirection works under bash (`/bin/bash` on macOS + Linux). Under zsh, `/dev/udp` is also supported by zsh's builtin in recent versions; if your zsh build lacks it, define `bg` in bash or use `nc -u -w1 $host $p </dev/null`.
+
+---
+
+## Phase 4-pre — Verify break-glass END-TO-END (gates Phase 4)
+
+> Do this from an **external** network (phone hotspot / tethered), NOT the home LAN.
+
+- [ ] **Step 1: Without knocking, the port is silent**
+
+```bash
+nc -z -w3 viktorbarzin.ddns.net 52222 && echo "OPEN(bad)" || echo "SILENT_OK"
+```
+Expected: `SILENT_OK`.
+
+- [ ] **Step 2: Knock + connect succeeds**
+
+```bash
+bg 'hostname; echo BREAKGLASS_E2E_OK'
+```
+Expected: the PVE hostname + `BREAKGLASS_E2E_OK`.
+
+- [ ] **Step 3: Full-LAN reach via the jump (no extra install)**
+
+```bash
+ssh -J breakglass root@10.0.20.1 'echo PFSENSE_REACHED' 2>/dev/null || echo "check pfSense ssh"
+ssh -J breakglass admin@192.168.1.13 'echo SYNOLOGY_REACHED' 2>/dev/null || echo "check synology ssh"
+```
+Expected: confirms you can reach pfSense + Synology *through* break-glass (so closing Rule 6 loses nothing).
+
+- [ ] **Step 4: LAN admin unaffected**
+
+From the home LAN: `ssh -p 22 root@192.168.1.127 'echo LAN22_OK'` → `LAN22_OK`.
+
+**GATE:** Only proceed to Phase 4 once Steps 1–4 pass. If any fail, fix before removing the legacy forward.
+
+---
+
+## Phase 5 — Router cleanup (LIVE router change — Viktor executes, AFTER Phase 4-pre passes)
+
+> AX6000 UI. One pass, all three changes.
+
+- [ ] **Step 1: Remove the Synology SSH exposure (Rule 6)**
+  - Advanced → NAT Forwarding → Port Forwarding → delete (or disable) rule **`HTTP` / 3333 → 192.168.1.13:22**.
+
+- [ ] **Step 2: Delete the stale Proxmox rule (Rule 3)**
+  - Delete the disabled rule **`proxmox` / 8006 → 192.168.1.127**.
+
+- [ ] **Step 3: Disable UPnP**
+  - Advanced → NAT Forwarding → UPnP → toggle **OFF**. (Tailscale on `.101` falls back to DERP relay; the `41643→pfSense` mapping drops.)
+
+- [ ] **Step 4: Verify the Synology SSH is gone from the WAN, break-glass still works**
+
+From an external network:
+```bash
+nc -z -w3 viktorbarzin.ddns.net 3333 && echo "STILL_OPEN(bad)" || echo "SYNOLOGY_SSH_CLOSED_OK"
+bg 'echo BREAKGLASS_STILL_OK'
+```
+Expected: `SYNOLOGY_SSH_CLOSED_OK` and `BREAKGLASS_STILL_OK`.
+
+---
+
+## Phase 6 — Docs + commit (AFTER infra repo is clean)
+
+- [ ] **Step 1: Update `docs/architecture/vpn.md`** — add a "Break-glass SSH" section (knock-gated SSH to PVE host, client `bg()`, cheat-sheet IPs).
+- [ ] **Step 2: Update `docs/architecture/security.md` + the Wave-1 note in `infra/.claude/CLAUDE.md`** — record the deliberate knock-gated exception; **correct the WAN-exposure inventory** (actual `.1` forwards are qbittorrent/stun/turn→pfSense + the new break-glass; Synology SSH removed; UPnP disabled; Remote Management off).
+- [ ] **Step 3: New runbook `docs/runbooks/breakglass-ssh.md`** — connect procedure, knock/key rotation, re-adding `.1` forwards after a router reset.
+- [ ] **Step 4: Commit the design + plan + doc updates** (only once Viktor confirms the repo is committable):
+
+```bash
+git -C /home/wizard/code/infra add \
+  docs/plans/2026-05-30-breakglass-ssh-access-design.md \
+  docs/plans/2026-05-30-breakglass-ssh-access-plan.md \
+  docs/architecture/vpn.md docs/architecture/security.md \
+  docs/runbooks/breakglass-ssh.md .claude/CLAUDE.md
+git -C /home/wizard/code/infra commit -m "docs+feat: break-glass knock-gated SSH; retire Synology SSH forward; disable UPnP [ci skip]"
+git -C /home/wizard/code/infra push origin master
+```
+
+---
+
+## Self-review
+
+- **Spec coverage:** key-only SSH ✅ (1.3), knock gate ✅ (1.4/1.5), invisibility ✅ (4-pre.1), full-LAN via jump ✅ (4-pre.3), no-lockout ✅ (1.1/1.3.4), Wave-1 exception doc ✅ (6.2), close legacy SSH ✅ (5.1), UPnP ✅ (5.3). All design §sections map to a task.
+- **Placeholder scan:** no TBDs; secret values are generated + Vault-stored, referenced via `vault kv get` (concrete, not placeholders).
+- **Consistency:** port `52222`, knock from `secret/viktor/breakglass_knock_sequence`, key `~/.ssh/breakglass_ed25519`, host `192.168.1.127` used consistently throughout.
+- **Open verify items** (flagged inline, non-blocking): #1 `.1` SNAT behaviour (2.3), pve-firewall coexistence (1.1.2).
--- a/docs/plans/2026-06-11-breakglass-ssh-redesign-design.md
+++ b/docs/plans/2026-06-11-breakglass-ssh-redesign-design.md
@ -0,0 +1,73 @@
+# Break-glass SSH — Redesign
+
+- **Date**: 2026-06-11
+- **Status**: Implemented
+- **Owner**: Viktor
+- **Supersedes**: `2026-05-30-breakglass-ssh-access-{design,plan}.md` (port-knock design)
+- **As-built runbook**: `docs/runbooks/breakglass-ssh.md`
+
+## Why redesign
+
+The 2026-05-30 design gated a key-only SSH port on the Proxmox host behind a UDP
+**port-knock** (knockd). It caused a real lockout, for a structural reason:
+
+- The knock sequence was 3 random ports stored **only** in Vault, and the client
+  helper fetched it from Vault at connect time.
+- **Vault is in-cluster** and not publicly reachable (Wave-1 policy). In the
+  exact scenario break-glass exists for — away from home, cluster/tunnels down —
+  the knock sequence is unreachable and unmemorable. Circular dependency.
+
+The knock's only benefit was hiding an already brute-force-proof port; its cost
+was that fragility. For a *recovery* path, robustness beats stealth.
+
+## Decision
+
+**Plain key-only SSH to the Proxmox host on `:52222`, openly reachable, no knock.**
+Hardened with: the exposed port trusts only a dedicated break-glass key
+(`Match LocalPort`), per-source connection rate-limiting (iptables hashlimit),
+and fail2ban. Scenario covered: *cluster + tunnels down, host + pfSense + router
+up* (the common "I'm away and need in" case — confirmed with Viktor; deeper
+"pfSense wedged" / "host down" tiers are explicitly out of scope).
+
+Alternatives considered and rejected: keeping the knock (fragile, circular);
+Tailscale-on-pfSense (briefly chosen, then dropped — reintroduces the upstream
+dependency Headscale is self-hosted to avoid, and the user preferred a
+self-contained stock-ssh path); WireGuard road-warrior (needs a client, and the
+self-contained SSH path was preferred).
+
+## Components
+
+| Layer | Change | Source of truth |
+|---|---|---|
+| sshd | dual-port `:22` (LAN, all keys) + `:52222` (WAN, break-glass key only via `Match LocalPort`, terminated by `Match all`); key-only everywhere | `scripts/sshd-10-breakglass.conf` |
+| host firewall | `BREAKGLASS` chain: `:52222` rate-limited per source, LAN bypass; replaced the knock-gated default-DROP | `scripts/breakglass-firewall.sh` (+ `breakglass-firewall.service`) |
+| fail2ban | jail fixed for Debian 13 (`journalmatch` by unit, not `_COMM=sshd`, else it never bans), bans on `:22`+`:52222` | `scripts/fail2ban-breakglass-sshd.local` |
+| knockd | **removed** (package purged, config deleted) | — |
+| edge router | `breakglass-ssh` WAN tcp/52222 → 192.168.1.127:52222; **removed** legacy Synology SSH forward (ext 3333 → .13:22) | manual (live device) |
+| Vault | `breakglass_ssh_{pub,priv}key` retained; `breakglass_knock_sequence` now dead | `secret/viktor` |
+
+## Edge-router constraints discovered (TP-Link AX6000)
+
+- **No port remapping** — external port must equal internal port (rejects e.g.
+  `22 → 52222` as a "conflict"). All forwards are ext==int; hence `:52222` both
+  sides.
+- **Port 22 is reserved** — `22 → 22` is also refused. Break-glass cannot use 22
+  (Viktor's initial preference); `:52222` is the landed port.
+- **Row delete is immediate** (no confirm dialog).
+
+## Security posture
+
+- **Brute force: impossible** (key-only, no password).
+- **Scannable: yes** — deliberate, documented Wave-1 exception (`security.md`).
+- **Residual risks:** sshd 0-day during exposure (mitigate: patch, rate-limit,
+  fail2ban, low MaxAuthTries); break-glass key theft (revoke by removing the
+  `authorized_keys.breakglass` line). Logins are audited (PVE ships sshd auth +
+  snoopy execve to Loki).
+
+## Verification (2026-06-11)
+
+- `:52222` reachable; break-glass key authenticates (`root@pve`).
+- Non-break-glass keys **rejected** on `:52222` (Match isolation works).
+- `:22` LAN admin unaffected (Match all reset confirmed — global root login intact).
+- Full WAN path: `ssh -p 52222 <WAN-IP>` with the break-glass key → `root@pve`.
+- knockd gone; fail2ban jail matches Debian 13 `sshd-session` lines.
--- a/docs/runbooks/breakglass-ssh.md
+++ b/docs/runbooks/breakglass-ssh.md
@ -0,0 +1,158 @@
+# Runbook: Break-glass SSH
+
+Cold-survivable, brute-force-proof SSH onto the home LAN for when the Kubernetes
+cluster and its remote-access tunnels (Headscale, cloudflared) are down but the
+**Proxmox host + edge router are up**. Redesigned 2026-06-11 — the previous
+port-knock design is decommissioned (see "History" below).
+
+## Model (as built)
+
+```
+your laptop (anywhere) ── ssh -p 52222 ──▶ edge router 192.168.1.1
+                                              │ WAN tcp/52222 ─▶ 192.168.1.127:52222
+                                              ▼
+                                       Proxmox host 192.168.1.127
+                                          sshd :52222 (key-only, break-glass key ONLY)
+                                          → full LAN via ssh -J / ssh -D
+```
+
+- **No port-knock.** Plain `ssh -p 52222`. The SSH key is the only gate.
+- **Key-only**, brute-force-proof. The exposed `:52222` trusts **only** the
+  dedicated break-glass key (`/root/.ssh/authorized_keys.breakglass`), separate
+  from root's normal LAN-admin keys, so it is independently revocable and a leak
+  of any other root key does not grant internet access.
+- **Rate-limited** per source IP (iptables hashlimit) + **fail2ban**. These trim
+  scanner noise only; key-only auth is the real protection.
+- **Exposed, not hidden.** `:52222` answers on the WAN (Shodan-visible). This is
+  a deliberate, documented exception to the Wave-1 "no public-IP access" policy
+  (see `docs/architecture/security.md`), chosen for self-containment: it has **no
+  dependency on the cluster** (unlike Headscale/cloudflared) and nothing to
+  remember (unlike the old knock, whose sequence lived only in in-cluster Vault).
+
+## Secrets (Vault `secret/viktor`)
+
+| Key | Use |
+|---|---|
+| `breakglass_ssh_pubkey` | authorized on the host (`authorized_keys.breakglass`) |
+| `breakglass_ssh_privkey` | the private key (also on your laptop at `~/.ssh/breakglass_ed25519`) |
+
+The key has **no passphrase** (so it works in a true cold event without anything
+to recall). Treat the private key as the sole credential — guard the laptop copy.
+
+> Leftover: `breakglass_knock_sequence` is dead (knock decommissioned). It is
+> inert; remove it when you have a Vault token with the `patch` capability
+> (`vault kv patch` / merge-patch — the everyday token lacks it).
+
+## Connect
+
+Client `~/.ssh/config`:
+
+```
+Host breakglass
+    HostName viktorbarzin.ddns.net        # follows the dynamic WAN IP
+    Port 52222
+    User root
+    IdentityFile ~/.ssh/breakglass_ed25519
+    IdentitiesOnly yes
+```
+
+Then:
+
+```bash
+ssh breakglass                              # shell on the Proxmox host
+ssh -J breakglass root@10.0.20.1            # jump to pfSense (or any LAN host)
+ssh -D 1080 breakglass                      # SOCKS5 → reach any internal IP
+```
+
+There is **no `bg()` knock function** anymore — delete it from your shell rc if
+you added it under the old design.
+
+## Cold-event IP cheat sheet (cluster DNS is down)
+
+| Host | IP |
+|---|---|
+| Proxmox host | `192.168.1.127` |
+| pfSense | `10.0.20.1` (WAN `192.168.1.2`) |
+| k8s API | `10.0.20.100` |
+| Synology NAS | `192.168.1.13` (reach via `ssh -J breakglass`) |
+| edge router | `192.168.1.1` |
+
+## Deploy / re-provision the host config
+
+Source of truth lives in `infra/scripts/`. To (re)deploy:
+
+```bash
+# 1. break-glass key authorized for the exposed port
+PUB="$(vault kv get -field=breakglass_ssh_pubkey secret/viktor)"
+ssh root@192.168.1.127 "printf '%s\n' '$PUB' > /root/.ssh/authorized_keys.breakglass && chmod 600 /root/.ssh/authorized_keys.breakglass"
+
+# 2. sshd drop-in (dual-port, Match-isolated) — validate before reload (anti-lockout)
+scp scripts/sshd-10-breakglass.conf root@192.168.1.127:/etc/ssh/sshd_config.d/10-breakglass.conf
+ssh root@192.168.1.127 'sshd -t && systemctl reload ssh'
+
+# 3. firewall (rate-limit) + boot unit
+scp scripts/breakglass-firewall.sh root@192.168.1.127:/usr/local/sbin/breakglass-firewall.sh
+ssh root@192.168.1.127 'chmod 0755 /usr/local/sbin/breakglass-firewall.sh && systemctl enable --now breakglass-firewall.service'
+
+# 4. fail2ban jail
+scp scripts/fail2ban-breakglass-sshd.local root@192.168.1.127:/etc/fail2ban/jail.d/breakglass-sshd.local
+ssh root@192.168.1.127 'systemctl restart fail2ban && fail2ban-client status sshd'
+```
+
+The `breakglass-firewall.service` unit (oneshot, `RemainAfterExit=yes`,
+`Before=network-online`-ish ordering) is a manual host unit — recreate it if the
+host is rebuilt:
+
+```ini
+[Unit]
+Description=Break-glass base firewall (key-only SSH on :52222)
+After=network-pre.target
+Wants=network-pre.target
+[Service]
+Type=oneshot
+ExecStart=/usr/local/sbin/breakglass-firewall.sh
+RemainAfterExit=yes
+[Install]
+WantedBy=multi-user.target
+```
+
+## Edge-router forward (manual — live device, not Terraform)
+
+TP-Link Archer AX6000 (`192.168.1.1`) → Advanced → NAT Forwarding → Port
+Forwarding. The break-glass rule:
+
+| Service Name | Device IP | External Port | Internal Port | Protocol |
+|---|---|---|---|---|
+| `breakglass-ssh` | `192.168.1.127` | `52222` | `52222` | TCP |
+
+**AX6000 quirks (learned 2026-06-11 — do not relearn the hard way):**
+- **External port must equal internal port.** The firmware rejects any remap
+  (e.g. `22 → 52222`) with *"External Port: This item conflicts with existed
+  ones."* Hence ext==int 52222.
+- **Port 22 is reserved** — even `22 → 22` is refused. Break-glass cannot use 22.
+- **Row delete is immediate** (no confirm dialog) — clicking the trash icon
+  removes the rule and toasts "Operation succeeded".
+- Automation: `~/wizard/tools/insecure-browse/add-forward.{sh,js}` (dockerized
+  Playwright; double-gated save `DRY_RUN=0 CONFIRM_SAVE=1`; supports
+  `RULES_JSON` add, `EDIT_RULES_JSON` protocol-edit, `DELETE_RULES_JSON`
+  identity-guarded delete). Router password: Vault
+  `secret/viktor/edge_router_192_168_1_1_password`.
+
+## Rotate / revoke
+
+- **Revoke instantly:** remove the line from `/root/.ssh/authorized_keys.breakglass`.
+- **Rotate the key:** `ssh-keygen -t ed25519 -a 100 -f ~/.ssh/breakglass_ed25519`,
+  `vault kv patch secret/viktor breakglass_ssh_privkey=@... breakglass_ssh_pubkey=...`,
+  redeploy step 1 above.
+- **Router reset wipes forwards:** re-add the `breakglass-ssh` rule above.
+
+## History
+
+- **2026-05-30:** original design — key-only SSH on `:52222` gated behind a
+  **UDP port-knock** (knockd). Decommissioned 2026-06-11: the knock added no real
+  security (the SSH key already makes the port brute-force-proof) and its only
+  benefit — hiding the port — came at the cost of a **circular dependency**: the
+  knock sequence lived only in in-cluster Vault, unreachable in the exact
+  cold/away scenario break-glass exists for. That caused a real lockout. The
+  knockd package + config + the legacy Synology SSH forward (ext 3333 → .13:22)
+  were removed.