Viktor got locked out of the break-glass path (forgot the port-knock setup) and deleted the edge-router forwards, then asked to review and redesign it from scratch. Root cause of the lockout: the knock added no real security (key-only SSH is already brute-force-proof) and its only benefit — hiding the port — came at the cost of a circular dependency. The knock sequence lived only in in-cluster Vault, which is unreachable in the exact away/cold scenario break-glass exists for. So the unlock secret was unavailable precisely when needed. New model (self-contained, nothing to remember): plain key-only SSH on the Proxmox host's :52222, openly reachable. The edge router forwards WAN tcp/52222 -> 192.168.1.127:52222 (external port MUST equal internal on the TP-Link AX6000 - it rejects remaps; port 22 itself is reserved). The exposed port trusts only a dedicated break-glass key via `Match LocalPort` (a leak of any other root key does not grant internet access), rate-limited (iptables hashlimit) + fail2ban. - Removed knockd (package + config) and the legacy Synology SSH forward (ext 3333 -> .13:22, a needless WAN exposure the original plan wanted gone). - Fixed the fail2ban jail for Debian 13 (auth logs under sshd-session, not sshd - the stock journalmatch silently never banned). - Versioned the host config in scripts/ (it was applied ad-hoc, never committed) and recorded the deliberate Wave-1 "no public-IP" exception in security.md + .claude/CLAUDE.md. Superseded the 2026-05-30 port-knock design docs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
3.9 KiB
Break-glass SSH — Redesign
- Date: 2026-06-11
- Status: Implemented
- Owner: Viktor
- Supersedes:
2026-05-30-breakglass-ssh-access-{design,plan}.md(port-knock design) - As-built runbook:
docs/runbooks/breakglass-ssh.md
Why redesign
The 2026-05-30 design gated a key-only SSH port on the Proxmox host behind a UDP port-knock (knockd). It caused a real lockout, for a structural reason:
- The knock sequence was 3 random ports stored only in Vault, and the client helper fetched it from Vault at connect time.
- Vault is in-cluster and not publicly reachable (Wave-1 policy). In the exact scenario break-glass exists for — away from home, cluster/tunnels down — the knock sequence is unreachable and unmemorable. Circular dependency.
The knock's only benefit was hiding an already brute-force-proof port; its cost was that fragility. For a recovery path, robustness beats stealth.
Decision
Plain key-only SSH to the Proxmox host on :52222, openly reachable, no knock.
Hardened with: the exposed port trusts only a dedicated break-glass key
(Match LocalPort), per-source connection rate-limiting (iptables hashlimit),
and fail2ban. Scenario covered: cluster + tunnels down, host + pfSense + router
up (the common "I'm away and need in" case — confirmed with Viktor; deeper
"pfSense wedged" / "host down" tiers are explicitly out of scope).
Alternatives considered and rejected: keeping the knock (fragile, circular); Tailscale-on-pfSense (briefly chosen, then dropped — reintroduces the upstream dependency Headscale is self-hosted to avoid, and the user preferred a self-contained stock-ssh path); WireGuard road-warrior (needs a client, and the self-contained SSH path was preferred).
Components
| Layer | Change | Source of truth |
|---|---|---|
| sshd | dual-port :22 (LAN, all keys) + :52222 (WAN, break-glass key only via Match LocalPort, terminated by Match all); key-only everywhere |
scripts/sshd-10-breakglass.conf |
| host firewall | BREAKGLASS chain: :52222 rate-limited per source, LAN bypass; replaced the knock-gated default-DROP |
scripts/breakglass-firewall.sh (+ breakglass-firewall.service) |
| fail2ban | jail fixed for Debian 13 (journalmatch by unit, not _COMM=sshd, else it never bans), bans on :22+:52222 |
scripts/fail2ban-breakglass-sshd.local |
| knockd | removed (package purged, config deleted) | — |
| edge router | breakglass-ssh WAN tcp/52222 → 192.168.1.127:52222; removed legacy Synology SSH forward (ext 3333 → .13:22) |
manual (live device) |
| Vault | breakglass_ssh_{pub,priv}key retained; breakglass_knock_sequence now dead |
secret/viktor |
Edge-router constraints discovered (TP-Link AX6000)
- No port remapping — external port must equal internal port (rejects e.g.
22 → 52222as a "conflict"). All forwards are ext==int; hence:52222both sides. - Port 22 is reserved —
22 → 22is also refused. Break-glass cannot use 22 (Viktor's initial preference);:52222is the landed port. - Row delete is immediate (no confirm dialog).
Security posture
- Brute force: impossible (key-only, no password).
- Scannable: yes — deliberate, documented Wave-1 exception (
security.md). - Residual risks: sshd 0-day during exposure (mitigate: patch, rate-limit,
fail2ban, low MaxAuthTries); break-glass key theft (revoke by removing the
authorized_keys.breakglassline). Logins are audited (PVE ships sshd auth + snoopy execve to Loki).
Verification (2026-06-11)
:52222reachable; break-glass key authenticates (root@pve).- Non-break-glass keys rejected on
:52222(Match isolation works). :22LAN admin unaffected (Match all reset confirmed — global root login intact).- Full WAN path:
ssh -p 52222 <WAN-IP>with the break-glass key →root@pve. - knockd gone; fail2ban jail matches Debian 13
sshd-sessionlines.