infra/stacks/chrome-service/files/novnc/entrypoint.sh
Viktor Barzin c7ead032ec
Some checks failed
ci/woodpecker/push/default Pipeline was successful
Build chrome-service-novnc / build (push) Has been cancelled
chrome-service: fix noVNC stuck-"Connecting" (x11vnc fd-sweep under nofile=2^31)
The noVNC view hung on "Connecting" forever then timed out. Root cause: x11vnc
sweeps the entire fd table (fcntl per fd) on every client connection, and
containerd grants pods RLIMIT_NOFILE=2^31, so the RFB handshake never completes
(websockify accepts the WS and dials localhost:5900, but x11vnc never sends its
banner — verified: handshake timed out at 8s, x11vnc had burned 1h41m CPU
spinning). Same bug + fix the android-emulator stack already carries.

Cap nofile before x11vnc starts, in two places:
- files/novnc/entrypoint.sh: `ulimit -n 65536` (root fix, makes the image correct)
- main.tf novnc container: `command = ["bash","-c","ulimit -n 65536; exec /entrypoint.sh"]`
  so the cap applies deterministically on rollout even though the image is
  :latest/IfNotPresent (a rebuilt entrypoint isn't guaranteed to be re-pulled).

Also documents the gotcha + diagnosis in docs/architecture/chrome-service.md and
notes the black-when-idle behaviour + the autoconnect URL.

(A live x11vnc relaunch with the cap already unblocked the running pod; this
makes it survive restarts.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 17:34:03 +00:00

46 lines
1.6 KiB
Bash

#!/usr/bin/env bash
# Connect to the chrome-service container's Xvfb (shared pod network, TCP)
# and serve the noVNC HTML5 client + websockify bridge on :6080.
set -e
# Containerd grants pods an effectively unbounded RLIMIT_NOFILE (2^31). x11vnc
# sweeps the WHOLE fd table with fcntl on every client connection, so each VNC
# connect hangs for ~forever and the noVNC client sits on "Connecting" until it
# times out. Cap it before launching x11vnc. (Same fix as the android-emulator
# stack; see docs/architecture/chrome-service.md "noVNC fd-sweep".)
ulimit -n 65536
for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15; do
if echo > /dev/tcp/127.0.0.1/6099 2>/dev/null; then
echo "Xvfb TCP up after attempt $i"
break
fi
echo "waiting for Xvfb TCP 6099 attempt=$i"
sleep 2
done
# websockify runs as PID 1; x11vnc is a child so its logs land on container stdout
# `-noshm` skips MIT-SHM probes that fail across container boundaries (each
# container has its own /dev/shm); `-noxdamage` skips XDAMAGE which Xvfb
# doesn't expose; `-quiet` keeps the polling chatter out of pod logs.
echo "starting x11vnc -> :5900"
x11vnc -display localhost:99 -nopw -listen 0.0.0.0 -rfbport 5900 \
-forever -shared -noshm -noxdamage -quiet 2>&1 &
X11VNC_PID=$!
for i in 1 2 3 4 5 6 7 8 9 10; do
if echo > /dev/tcp/127.0.0.1/5900 2>/dev/null; then
echo "x11vnc bound 5900 after attempt $i"
break
fi
echo "waiting for x11vnc :5900 attempt=$i"
sleep 2
done
if ! echo > /dev/tcp/127.0.0.1/5900 2>/dev/null; then
echo "ERROR: x11vnc did not bind 5900"
exit 1
fi
echo "starting websockify -> :6080"
exec websockify --web=/usr/share/novnc 6080 localhost:5900