t3: connection logging across the path for drop attribution
Viktor asked to add connection logs (Traefik/Cloudflare) to catch the
real-path t3 WS drops: a direct-to-t3-serve browser ran 40 min clean
while real tunnel sessions cycle every 15-35s, so the drop originates
above t3-serve and we need to see which layer cuts the socket.
Traefik (/ws duration) and cloudflared (WS close events) already ship to
Loki; the gap was the devvm side. This adds:
- t3-dispatch logs every /ws open/close with dur_ms + cause:
downstream_closed (client/CF/Traefik hung up = last-mile/network),
upstream_closed (t3-serve closed/reset), or graceful. Graceful closes
previously left no trace (default ReverseProxy only logs on error), so a
watchdog-driven reconnect was invisible. Helpers unit-tested.
- devvm-promtail.{yaml,service}: ships devvm journald (t3-dispatch +
t3-serve@<user>) to cluster Loki as job=devvm-journal, mirroring the
pve/rpi-sofia shippers. devvm was never in Loki (standalone VM).
Joined in Loki the three layers attribute any future drop to a segment
with no repro needed. Runbook + service-catalog updated.
This commit is contained in:
parent
933e4649fb
commit
9b19caff47
7 changed files with 231 additions and 3 deletions
|
|
@ -32,7 +32,7 @@
|
||||||
|---------|-------------|-------|
|
|---------|-------------|-------|
|
||||||
| k8s-dashboard | Kubernetes dashboard at `k8s.viktorbarzin.me`. **Forward-auth + auto-injected SA token** (apiserver OIDC blocked, see design §12). nginx token-injector (`dashboard_injector.tf`) maps `X-authentik-username` → the user's `dashboard-<user>` SA token (ns admin + read-only on namespace-list/nodes only via `dashboard-nav-readonly` — no cross-tenant reads, `rbac/.../dashboard-sa.tf`; admins → cluster-admin SA) and sets `Authorization: Bearer` → no token-paste, dashboard auto-authenticates per user. Forward-auth admits `kubernetes-*` groups for this host (`stacks/authentik/admin-services-restriction.tf`). oauth2-proxy + `k8s-dashboard` OIDC app built but idle. | k8s-dashboard |
|
| k8s-dashboard | Kubernetes dashboard at `k8s.viktorbarzin.me`. **Forward-auth + auto-injected SA token** (apiserver OIDC blocked, see design §12). nginx token-injector (`dashboard_injector.tf`) maps `X-authentik-username` → the user's `dashboard-<user>` SA token (ns admin + read-only on namespace-list/nodes only via `dashboard-nav-readonly` — no cross-tenant reads, `rbac/.../dashboard-sa.tf`; admins → cluster-admin SA) and sets `Authorization: Bearer` → no token-paste, dashboard auto-authenticates per user. Forward-auth admits `kubernetes-*` groups for this host (`stacks/authentik/admin-services-restriction.tf`). oauth2-proxy + `k8s-dashboard` OIDC app built but idle. | k8s-dashboard |
|
||||||
| reverse-proxy | Generic reverse proxy | reverse-proxy |
|
| reverse-proxy | Generic reverse proxy | reverse-proxy |
|
||||||
| t3code | Multi-user coding-agent GUI at t3.viktorbarzin.me. `auth=required` (Authentik) → DevVM `t3-dispatch` service (`10.0.10.10:3780`, unprivileged user) maps `X-authentik-username` → that user's own `t3-serve@<u>` instance (file perms enforced by uid; wizard→:3773, emo→:3774; unmapped→403) and **auto-injects the t3 session on first visit** (mints via the root `t3-mint` wrapper, scoped sudoers → `/api/auth/bootstrap` `t3_session` cookie). **Source of truth = `infra/scripts/workstation/roster.yaml`** (os_user → authentik_user/k8s_user/tier/namespaces); `roster_engine.py` (pytest-covered) derives desired state and `t3-provision-users` (hourly systemd timer) applies it — constrained accounts, additive per-tier groups, `t3-serve@<u>` instances, and **regenerating** `/etc/ttyd-user-map` + `dispatch.json` (those two are now GENERATED — do not hand-edit). New non-admins inherit wizard's Claude config (machine-wide managed `claudeMd` in `/etc/claude-code/managed-settings.json` + per-user `~/.claude/{skills,rules}` symlinks seeded by `/etc/skel`) and get a **writable git-crypt-LOCKED** infra clone at `~/code` (code plaintext, secret files ciphertext). Tiers: admin / power-user (cluster-wide read-only) / namespace-owner. **Add a user:** one entry in `roster.yaml` → reconcile. Per-user OIDC kubeconfig, the `oidc-power-user-readonly` ClusterRole, and the Authentik `T3 Users` edge gate are applied (the gate is live — only `T3 Users` members reach t3); the emo cutover to his own locked clone is the remaining gated step. DevVM artifacts versioned in `infra/scripts/` (`t3-serve@.service`, `t3-provision-users` + `workstation/{roster.yaml,roster_engine.py,setup-devvm.sh,managed-settings.json,skel/}`, `t3-dispatch/`, `t3-mint`, `sudoers-t3-autopair`, `t3-autoupdate.*`); TF (`stacks/t3code`) owns only the ingress + Endpoints→:3780. **t3 binary is PINNED** (`T3_PIN`, currently `0.0.24`) — `t3-autoupdate` is a daily *enforcer* that re-asserts the pin (a no-op when correct; restarts only idle instances), NOT a nightly tracker. It used to track `nightly`, but on 2026-06-09 a nightly bump migrated every `~/.t3/state.sqlite` forward (`role`→`scopes`) and changed the bootstrap API, breaking pairing for ALL users (post-mortem `2026-06-09-t3-nightly-autoupdate-auth-outage.md`). t3 is pre-1.0, so moving the pin is a deliberate, reversible step via `docs/runbooks/t3-version-bump.md` (pre-bump `state.sqlite` backup → bump `T3_PIN` → enforcer install with a REAL pairing health-check that auto-rolls-back → verify → restore). Pin set in `t3-autoupdate.sh` + `setup-devvm.sh` (keep in sync). `t3-dispatch` is **version-agnostic** (2026-06-09): `autoPair` tries `/api/auth/browser-session` (0.0.25) then falls back to `/api/auth/bootstrap` (0.0.24), so 0.0.24↔0.0.25 needs no dispatch change. `~/.t3` is backed up daily by `t3-backup-state` (online `VACUUM INTO`; previously unbacked — it's the only copy). Native app/app.t3.codes unsupported (cross-origin) — deferred until published. Design: `docs/plans/2026-06-01-t3-auto-provision-*`. **Drop attribution (2026-06-10):** `t3-probe` Deployment (same ns) holds differential legs — `cloudflare` (full public path via DoH-pinned DNS), `internal` (Traefik LB only), `t3serve` (devvm:3773 direct) — against dispatch's unauthenticated `/probe` carve-out (walloff-guarded); Prometheus job `t3-probe`, alerts `T3ProbeLegDown`/`T3ProbeDropBurst`, runbook `docs/runbooks/t3-drop-attribution.md`. `t3-serve@` units carry memory containment (`MemoryHigh=12G/MemoryMax=16G/MemorySwapMax=0/OOMPolicy=continue`) so a runaway agent OOMs alone instead of freezing devvm. | t3code |
|
| t3code | Multi-user coding-agent GUI at t3.viktorbarzin.me. `auth=required` (Authentik) → DevVM `t3-dispatch` service (`10.0.10.10:3780`, unprivileged user) maps `X-authentik-username` → that user's own `t3-serve@<u>` instance (file perms enforced by uid; wizard→:3773, emo→:3774; unmapped→403) and **auto-injects the t3 session on first visit** (mints via the root `t3-mint` wrapper, scoped sudoers → `/api/auth/bootstrap` `t3_session` cookie). **Source of truth = `infra/scripts/workstation/roster.yaml`** (os_user → authentik_user/k8s_user/tier/namespaces); `roster_engine.py` (pytest-covered) derives desired state and `t3-provision-users` (hourly systemd timer) applies it — constrained accounts, additive per-tier groups, `t3-serve@<u>` instances, and **regenerating** `/etc/ttyd-user-map` + `dispatch.json` (those two are now GENERATED — do not hand-edit). New non-admins inherit wizard's Claude config (machine-wide managed `claudeMd` in `/etc/claude-code/managed-settings.json` + per-user `~/.claude/{skills,rules}` symlinks seeded by `/etc/skel`) and get a **writable git-crypt-LOCKED** infra clone at `~/code` (code plaintext, secret files ciphertext). Tiers: admin / power-user (cluster-wide read-only) / namespace-owner. **Add a user:** one entry in `roster.yaml` → reconcile. Per-user OIDC kubeconfig, the `oidc-power-user-readonly` ClusterRole, and the Authentik `T3 Users` edge gate are applied (the gate is live — only `T3 Users` members reach t3); the emo cutover to his own locked clone is the remaining gated step. DevVM artifacts versioned in `infra/scripts/` (`t3-serve@.service`, `t3-provision-users` + `workstation/{roster.yaml,roster_engine.py,setup-devvm.sh,managed-settings.json,skel/}`, `t3-dispatch/`, `t3-mint`, `sudoers-t3-autopair`, `t3-autoupdate.*`); TF (`stacks/t3code`) owns only the ingress + Endpoints→:3780. **t3 binary is PINNED** (`T3_PIN`, currently `0.0.24`) — `t3-autoupdate` is a daily *enforcer* that re-asserts the pin (a no-op when correct; restarts only idle instances), NOT a nightly tracker. It used to track `nightly`, but on 2026-06-09 a nightly bump migrated every `~/.t3/state.sqlite` forward (`role`→`scopes`) and changed the bootstrap API, breaking pairing for ALL users (post-mortem `2026-06-09-t3-nightly-autoupdate-auth-outage.md`). t3 is pre-1.0, so moving the pin is a deliberate, reversible step via `docs/runbooks/t3-version-bump.md` (pre-bump `state.sqlite` backup → bump `T3_PIN` → enforcer install with a REAL pairing health-check that auto-rolls-back → verify → restore). Pin set in `t3-autoupdate.sh` + `setup-devvm.sh` (keep in sync). `t3-dispatch` is **version-agnostic** (2026-06-09): `autoPair` tries `/api/auth/browser-session` (0.0.25) then falls back to `/api/auth/bootstrap` (0.0.24), so 0.0.24↔0.0.25 needs no dispatch change. `~/.t3` is backed up daily by `t3-backup-state` (online `VACUUM INTO`; previously unbacked — it's the only copy). Native app/app.t3.codes unsupported (cross-origin) — deferred until published. Design: `docs/plans/2026-06-01-t3-auto-provision-*`. **Drop attribution (2026-06-10):** `t3-probe` Deployment (same ns) holds differential legs — `cloudflare` (full public path via DoH-pinned DNS), `internal` (Traefik LB only), `t3serve` (devvm:3773 direct) — against dispatch's unauthenticated `/probe` carve-out (walloff-guarded); Prometheus job `t3-probe`, alerts `T3ProbeLegDown`/`T3ProbeDropBurst`, runbook `docs/runbooks/t3-drop-attribution.md`. `t3-serve@` units carry memory containment (`MemoryHigh=12G/MemoryMax=16G/MemorySwapMax=0/OOMPolicy=continue`) so a runaway agent OOMs alone instead of freezing devvm. **Connection logs (2026-06-11):** `t3-dispatch` logs every `/ws` open/close with `dur_ms` + `cause` (`downstream_closed`=client/CF/Traefik hung up → last-mile; `upstream_closed`=t3-serve closed; `graceful`); devvm journald now ships to Loki via `scripts/devvm-promtail.*` (`{job="devvm-journal"}` + `{job="sshd-devvm"}`), joining Traefik `/ws`-duration + cloudflared close events already in Loki for full per-drop attribution without a repro. **Empirical (2026-06-11):** direct-to-t3-serve held one WS 40 min (0 drops) while a real tunnel session cycled 5×/90s → drop originates above t3-serve on the public path, NOT in t3-serve itself; `t3 auth pairing create`+`/api/auth/browser-session` works but dispatch **auto-pair is 401-broken on v0.0.26** (latent; live 30-day cookies mask it). | t3code |
|
||||||
|
|
||||||
## Active Use
|
## Active Use
|
||||||
| Service | Description | Stack |
|
| Service | Description | Stack |
|
||||||
|
|
|
||||||
|
|
@ -35,6 +35,41 @@ Attribution table:
|
||||||
|
|
||||||
Alerts `T3ProbeLegDown` / `T3ProbeDropBurst` fire on sustained breakage.
|
Alerts `T3ProbeLegDown` / `T3ProbeDropBurst` fire on sustained breakage.
|
||||||
|
|
||||||
|
## 1b. Connection logs in Loki (passive, always-on — catch a real drop)
|
||||||
|
|
||||||
|
Three layers of the real path log every t3 `/ws` connection to Loki, so a drop
|
||||||
|
the user actually experienced is attributable after the fact without a repro. A
|
||||||
|
drop is **a short-lived `/ws` connection** (a healthy session holds one socket
|
||||||
|
for hours); the client's 20s heartbeat watchdog reconnects on any break.
|
||||||
|
|
||||||
|
| Layer | Loki stream | What it tells you |
|
||||||
|
|---|---|---|
|
||||||
|
| Traefik | `{job="traefik"}` ⟶ filter `t3code-t3` + `GET /ws` | per-connection **duration** (trailing `…ms`) + edge (cloudflared pod) IP |
|
||||||
|
| cloudflared | `{job="cloudflared"}` ⟶ filter `t3.viktorbarzin.me/ws` | CF-tunnel-side close (`ended abruptly: context canceled` = browser/CF side hung up) |
|
||||||
|
| t3-dispatch | `{job="devvm-journal",unit="t3-dispatch.service"} \|= "ws close"` | **`dur_ms` + `cause`** — the discriminator below |
|
||||||
|
|
||||||
|
`cause` on the dispatch `ws close` line:
|
||||||
|
- **`downstream_closed`** — client / Cloudflare / Traefik tore the socket down
|
||||||
|
(`context canceled`). Short `dur_ms` = client watchdog firing → a **last-mile /
|
||||||
|
network-quality** drop (or CF/tunnel blip); t3-serve was fine.
|
||||||
|
- **`upstream_closed`** — the user's `t3 serve` closed/reset (reset by peer / EOF
|
||||||
|
/ refused) → t3-serve stall/restart/OOM.
|
||||||
|
- **`graceful`** — clean close from either side (e.g. the client watchdog's
|
||||||
|
`disconnect()` after a >20s heartbeat gap). Cross-check `dur_ms`: a ~20s+
|
||||||
|
graceful close with no devvm pressure spike (§3) is a heartbeat-timeout whose
|
||||||
|
stall was NOT on devvm → last-mile.
|
||||||
|
|
||||||
|
Triage query (Grafana Explore → Loki) — every short t3 socket in a window:
|
||||||
|
|
||||||
|
```logql
|
||||||
|
{job="devvm-journal", unit="t3-dispatch.service"} |= "ws close"
|
||||||
|
| regexp `dur_ms=(?P<dur>[0-9]+) cause=(?P<cause>\S+)` | dur < 120000
|
||||||
|
```
|
||||||
|
|
||||||
|
Line the timestamp up against `{job="traefik"}` (duration + edge IP) and
|
||||||
|
`{job="cloudflared"}` (CF-side close) for the same second to localise the layer.
|
||||||
|
devvm journald (incl. `t3-serve@<user>`) ships via `scripts/devvm-promtail.*`.
|
||||||
|
|
||||||
## 2. Server-side log recipe (per-event forensics)
|
## 2. Server-side log recipe (per-event forensics)
|
||||||
|
|
||||||
On devvm (timestamps in UTC):
|
On devvm (timestamps in UTC):
|
||||||
|
|
|
||||||
17
scripts/devvm-promtail.service
Normal file
17
scripts/devvm-promtail.service
Normal file
|
|
@ -0,0 +1,17 @@
|
||||||
|
# systemd unit for promtail on the devvm (10.0.10.10). Install to
|
||||||
|
# /etc/systemd/system/promtail.service. See scripts/devvm-promtail.yaml for the full deploy.
|
||||||
|
[Unit]
|
||||||
|
Description=Promtail (ships devvm journal -> cluster Loki)
|
||||||
|
After=network-online.target
|
||||||
|
Wants=network-online.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
ExecStart=/usr/local/bin/promtail -config.file=/etc/promtail/config.yml
|
||||||
|
Restart=on-failure
|
||||||
|
RestartSec=5
|
||||||
|
User=root
|
||||||
|
Group=root
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
59
scripts/devvm-promtail.yaml
Normal file
59
scripts/devvm-promtail.yaml
Normal file
|
|
@ -0,0 +1,59 @@
|
||||||
|
# Promtail config for the devvm (10.0.10.10) — ships the systemd journal to cluster Loki.
|
||||||
|
#
|
||||||
|
# devvm is a standalone VM (NOT a k8s node), so its journal — including the t3
|
||||||
|
# stack (t3-dispatch, t3-serve@<user>) — was never in Loki. Added 2026-06-11 for
|
||||||
|
# t3 drop forensics: t3-dispatch now logs each /ws connection's open/close with
|
||||||
|
# duration + which side hung up (downstream_closed = client/CF/Traefik went away;
|
||||||
|
# upstream_closed = t3-serve closed/stalled; graceful = clean close). Joined with
|
||||||
|
# Traefik's per-/ws duration (already in Loki) this attributes every drop to a layer.
|
||||||
|
#
|
||||||
|
# NOT Terraform-managed (devvm is outside k8s) — same hand-deployed pattern as
|
||||||
|
# scripts/pve-promtail.* and the rpi-sofia promtail. This file is source-of-truth.
|
||||||
|
#
|
||||||
|
# Deploy (on devvm, as root via sudo):
|
||||||
|
# sudo install -d -m 0755 /etc/promtail /var/lib/promtail
|
||||||
|
# sudo install -m 0644 scripts/devvm-promtail.yaml /etc/promtail/config.yml
|
||||||
|
# sudo install -m 0644 scripts/devvm-promtail.service /etc/systemd/system/promtail.service
|
||||||
|
# # Binary: grafana/loki v3.5.1 promtail-linux-amd64 -> /usr/local/bin/promtail (chmod 0755).
|
||||||
|
# sudo systemctl daemon-reload && sudo systemctl enable --now promtail
|
||||||
|
# # Loki reach: loki.viktorbarzin.lan (Technitium CNAME -> live Traefik LB; insecure cert).
|
||||||
|
#
|
||||||
|
# Streams produced:
|
||||||
|
# {job="devvm-journal"} — full devvm journal
|
||||||
|
# {job="devvm-journal", unit="t3-dispatch.service"} — dispatch (ws open/close lines)
|
||||||
|
# {job="devvm-journal", unit="t3-serve@wizard.service"} — per-user t3 serve
|
||||||
|
# {job="sshd-devvm"} — sshd auth lines (parity with sshd-pve)
|
||||||
|
server:
|
||||||
|
http_listen_port: 9080
|
||||||
|
grpc_listen_port: 0
|
||||||
|
log_level: warn
|
||||||
|
|
||||||
|
positions:
|
||||||
|
filename: /var/lib/promtail/positions.yaml
|
||||||
|
|
||||||
|
clients:
|
||||||
|
- url: https://loki.viktorbarzin.lan/loki/api/v1/push
|
||||||
|
tls_config:
|
||||||
|
insecure_skip_verify: true
|
||||||
|
|
||||||
|
scrape_configs:
|
||||||
|
- job_name: journal
|
||||||
|
journal:
|
||||||
|
max_age: 12h
|
||||||
|
json: false
|
||||||
|
path: /var/log/journal
|
||||||
|
labels:
|
||||||
|
host: devvm
|
||||||
|
job: devvm-journal
|
||||||
|
relabel_configs:
|
||||||
|
- source_labels: ['__journal__systemd_unit']
|
||||||
|
target_label: unit
|
||||||
|
- source_labels: ['__journal_priority_keyword']
|
||||||
|
target_label: level
|
||||||
|
- source_labels: ['__journal_syslog_identifier']
|
||||||
|
target_label: identifier
|
||||||
|
# sshd auth lines -> job=sshd-devvm (parity with the pve shipper's sshd-pve).
|
||||||
|
- source_labels: ['__journal_syslog_identifier']
|
||||||
|
regex: 'sshd.*'
|
||||||
|
target_label: job
|
||||||
|
replacement: 'sshd-devvm'
|
||||||
|
|
@ -2,4 +2,4 @@ module t3-dispatch
|
||||||
|
|
||||||
go 1.22
|
go 1.22
|
||||||
|
|
||||||
require github.com/gorilla/websocket v1.5.3 // indirect
|
require github.com/gorilla/websocket v1.5.3
|
||||||
|
|
|
||||||
|
|
@ -212,7 +212,64 @@ func handler(w http.ResponseWriter, r *http.Request) {
|
||||||
}
|
}
|
||||||
// Steady state: reverse-proxy (incl. WebSocket upgrade) to the user's instance.
|
// Steady state: reverse-proxy (incl. WebSocket upgrade) to the user's instance.
|
||||||
target, _ := url.Parse(fmt.Sprintf("http://127.0.0.1:%d", e.Port))
|
target, _ := url.Parse(fmt.Sprintf("http://127.0.0.1:%d", e.Port))
|
||||||
httputil.NewSingleHostReverseProxy(target).ServeHTTP(w, r)
|
proxy := httputil.NewSingleHostReverseProxy(target)
|
||||||
|
|
||||||
|
// WebSocket connection logging: t3 drops manifest as the client's 20s
|
||||||
|
// heartbeat watchdog reconnecting, so a flood of short-lived /ws connections
|
||||||
|
// IS the symptom. Log each WS open + close (duration + which side hung up) so
|
||||||
|
// a drop is attributable from logs alone — graceful closes otherwise leave no
|
||||||
|
// trace (the default ReverseProxy only logs on error). cause stays "graceful"
|
||||||
|
// unless ErrorHandler fires; ErrorHandler runs within ServeHTTP, so reading
|
||||||
|
// cause after ServeHTTP returns needs no synchronisation.
|
||||||
|
if isWebSocket(r) {
|
||||||
|
start := time.Now()
|
||||||
|
ip := clientIP(r)
|
||||||
|
cause := "graceful"
|
||||||
|
proxy.ErrorHandler = func(rw http.ResponseWriter, _ *http.Request, err error) {
|
||||||
|
cause = classifyClose(err)
|
||||||
|
}
|
||||||
|
log.Printf("ws open user=%s ip=%s", e.OsUser, ip)
|
||||||
|
proxy.ServeHTTP(w, r)
|
||||||
|
log.Printf("ws close user=%s ip=%s dur_ms=%d cause=%s",
|
||||||
|
e.OsUser, ip, time.Since(start).Milliseconds(), cause)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
proxy.ServeHTTP(w, r)
|
||||||
|
}
|
||||||
|
|
||||||
|
// isWebSocket reports whether r is a WebSocket upgrade request.
|
||||||
|
func isWebSocket(r *http.Request) bool {
|
||||||
|
return strings.EqualFold(r.Header.Get("Upgrade"), "websocket") &&
|
||||||
|
strings.Contains(strings.ToLower(r.Header.Get("Connection")), "upgrade")
|
||||||
|
}
|
||||||
|
|
||||||
|
// clientIP returns the forwarded client chain (X-Forwarded-For, set by
|
||||||
|
// Traefik/CF) when present, else the immediate peer — for correlating a drop
|
||||||
|
// to a specific client/edge.
|
||||||
|
func clientIP(r *http.Request) string {
|
||||||
|
if xff := r.Header.Get("X-Forwarded-For"); xff != "" {
|
||||||
|
return xff
|
||||||
|
}
|
||||||
|
return r.RemoteAddr
|
||||||
|
}
|
||||||
|
|
||||||
|
// classifyClose maps a reverse-proxy copy error to which side ended the socket:
|
||||||
|
// downstream (client/CF/Traefik went away) vs upstream (the user's t3 serve
|
||||||
|
// closed/reset). Distinguishes a last-mile/client drop from a t3-serve stall.
|
||||||
|
func classifyClose(err error) string {
|
||||||
|
if err == nil {
|
||||||
|
return "graceful"
|
||||||
|
}
|
||||||
|
s := err.Error()
|
||||||
|
switch {
|
||||||
|
case strings.Contains(s, "context canceled"):
|
||||||
|
return "downstream_closed" // client / CF / Traefik tore down
|
||||||
|
case strings.Contains(s, "reset by peer"), strings.Contains(s, "broken pipe"),
|
||||||
|
strings.Contains(s, "EOF"), strings.Contains(s, "connection refused"):
|
||||||
|
return "upstream_closed" // t3 serve closed / unreachable
|
||||||
|
default:
|
||||||
|
return s
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
|
|
|
||||||
|
|
@ -301,3 +301,63 @@ func TestProbeWSEcho(t *testing.T) {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func TestIsWebSocket(t *testing.T) {
|
||||||
|
cases := []struct {
|
||||||
|
up, conn string
|
||||||
|
want bool
|
||||||
|
}{
|
||||||
|
{"websocket", "Upgrade", true},
|
||||||
|
{"websocket", "keep-alive, Upgrade", true},
|
||||||
|
{"WebSocket", "upgrade", true},
|
||||||
|
{"", "keep-alive", false},
|
||||||
|
{"h2c", "Upgrade", false},
|
||||||
|
{"websocket", "keep-alive", false},
|
||||||
|
}
|
||||||
|
for _, c := range cases {
|
||||||
|
r, _ := http.NewRequest("GET", "/ws", nil)
|
||||||
|
if c.up != "" {
|
||||||
|
r.Header.Set("Upgrade", c.up)
|
||||||
|
}
|
||||||
|
r.Header.Set("Connection", c.conn)
|
||||||
|
if got := isWebSocket(r); got != c.want {
|
||||||
|
t.Errorf("isWebSocket(up=%q conn=%q)=%v want %v", c.up, c.conn, got, c.want)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestClassifyClose(t *testing.T) {
|
||||||
|
cases := []struct {
|
||||||
|
in error
|
||||||
|
want string
|
||||||
|
}{
|
||||||
|
{nil, "graceful"},
|
||||||
|
{errTest("context canceled"), "downstream_closed"},
|
||||||
|
{errTest("read tcp 127.0.0.1:60664->127.0.0.1:3773: read: connection reset by peer"), "upstream_closed"},
|
||||||
|
{errTest("write: broken pipe"), "upstream_closed"},
|
||||||
|
{errTest("unexpected EOF"), "upstream_closed"},
|
||||||
|
{errTest("dial tcp 127.0.0.1:3773: connect: connection refused"), "upstream_closed"},
|
||||||
|
{errTest("some novel error"), "some novel error"},
|
||||||
|
}
|
||||||
|
for _, c := range cases {
|
||||||
|
if got := classifyClose(c.in); got != c.want {
|
||||||
|
t.Errorf("classifyClose(%v)=%q want %q", c.in, got, c.want)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
type errTest string
|
||||||
|
|
||||||
|
func (e errTest) Error() string { return string(e) }
|
||||||
|
|
||||||
|
func TestClientIP(t *testing.T) {
|
||||||
|
r, _ := http.NewRequest("GET", "/ws", nil)
|
||||||
|
r.RemoteAddr = "10.0.0.5:1234"
|
||||||
|
if got := clientIP(r); got != "10.0.0.5:1234" {
|
||||||
|
t.Errorf("clientIP no-xff = %q", got)
|
||||||
|
}
|
||||||
|
r.Header.Set("X-Forwarded-For", "1.2.3.4, 10.10.1.1")
|
||||||
|
if got := clientIP(r); got != "1.2.3.4, 10.10.1.1" {
|
||||||
|
t.Errorf("clientIP xff = %q", got)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue