t3: connection logging across the path for drop attribution
Viktor asked to add connection logs (Traefik/Cloudflare) to catch the
real-path t3 WS drops: a direct-to-t3-serve browser ran 40 min clean
while real tunnel sessions cycle every 15-35s, so the drop originates
above t3-serve and we need to see which layer cuts the socket.
Traefik (/ws duration) and cloudflared (WS close events) already ship to
Loki; the gap was the devvm side. This adds:
- t3-dispatch logs every /ws open/close with dur_ms + cause:
downstream_closed (client/CF/Traefik hung up = last-mile/network),
upstream_closed (t3-serve closed/reset), or graceful. Graceful closes
previously left no trace (default ReverseProxy only logs on error), so a
watchdog-driven reconnect was invisible. Helpers unit-tested.
- devvm-promtail.{yaml,service}: ships devvm journald (t3-dispatch +
t3-serve@<user>) to cluster Loki as job=devvm-journal, mirroring the
pve/rpi-sofia shippers. devvm was never in Loki (standalone VM).
Joined in Loki the three layers attribute any future drop to a segment
with no repro needed. Runbook + service-catalog updated.
This commit is contained in:
parent
933e4649fb
commit
9b19caff47
7 changed files with 231 additions and 3 deletions
17
scripts/devvm-promtail.service
Normal file
17
scripts/devvm-promtail.service
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
# systemd unit for promtail on the devvm (10.0.10.10). Install to
|
||||
# /etc/systemd/system/promtail.service. See scripts/devvm-promtail.yaml for the full deploy.
|
||||
[Unit]
|
||||
Description=Promtail (ships devvm journal -> cluster Loki)
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
ExecStart=/usr/local/bin/promtail -config.file=/etc/promtail/config.yml
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
User=root
|
||||
Group=root
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
Loading…
Add table
Add a link
Reference in a new issue