chrome-service: supervise x11vnc in noVNC sidecar so the VNC view self-heals
The noVNC view at chrome.viktorbarzin.me went black: x11vnc (in the novnc sidecar) attaches to the browser container's Xvfb over localhost:6099, and when that container restarted (~8h ago, Chrome exited cleanly) x11vnc lost its X connection and exited. Because the entrypoint ran x11vnc as an unsupervised background child and then exec'd websockify as PID 1, the dead x11vnc was never relaunched — :5900 stayed dead (a defunct zombie), websockify kept returning 'Connection refused', and the view was black until a manual pod restart. Fix: the entrypoint now runs both x11vnc and websockify as supervised background children and exits non-zero via 'wait -n' if either dies, so the kubelet restarts the novnc container, which re-waits for Xvfb and relaunches x11vnc. The bridge now self-heals across browser-container restarts. Mirrors the android-emulator stack's supervision pattern. Architecture doc updated with the new failure mode, diagnosis, immediate-recovery, and SHA-pin deploy note. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
abb15cd49d
commit
19d0f0933a
2 changed files with 57 additions and 6 deletions
|
|
@ -19,14 +19,14 @@ for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15; do
|
|||
sleep 2
|
||||
done
|
||||
|
||||
# websockify runs as PID 1; x11vnc is a child so its logs land on container stdout
|
||||
# `-noshm` skips MIT-SHM probes that fail across container boundaries (each
|
||||
# container has its own /dev/shm); `-noxdamage` skips XDAMAGE which Xvfb
|
||||
# doesn't expose; `-quiet` keeps the polling chatter out of pod logs.
|
||||
# Both x11vnc and websockify run as supervised children of this entrypoint (PID
|
||||
# 1) so their logs land on container stdout and the `wait -n` at the end can catch
|
||||
# either one dying. `-noshm` skips MIT-SHM probes that fail across container
|
||||
# boundaries (each container has its own /dev/shm); `-noxdamage` skips XDAMAGE
|
||||
# which Xvfb doesn't expose; `-quiet` keeps the polling chatter out of pod logs.
|
||||
echo "starting x11vnc -> :5900"
|
||||
x11vnc -display localhost:99 -nopw -listen 0.0.0.0 -rfbport 5900 \
|
||||
-forever -shared -noshm -noxdamage -quiet 2>&1 &
|
||||
X11VNC_PID=$!
|
||||
|
||||
for i in 1 2 3 4 5 6 7 8 9 10; do
|
||||
if echo > /dev/tcp/127.0.0.1/5900 2>/dev/null; then
|
||||
|
|
@ -43,4 +43,18 @@ if ! echo > /dev/tcp/127.0.0.1/5900 2>/dev/null; then
|
|||
fi
|
||||
|
||||
echo "starting websockify -> :6080"
|
||||
exec websockify --web=/usr/share/novnc 6080 localhost:5900
|
||||
# Run websockify in the background (it was `exec`ed before) so BOTH it and x11vnc
|
||||
# are supervised. x11vnc attaches to the chrome-service container's Xvfb over
|
||||
# localhost:6099 (shared pod network); when that container restarts, x11vnc loses
|
||||
# its X connection and exits. Previously websockify was PID 1 and x11vnc was an
|
||||
# unsupervised child, so a dead x11vnc was never relaunched: :5900 stayed dead and
|
||||
# the noVNC view went black until a manual pod restart. Now if EITHER process
|
||||
# exits, `wait -n` returns and we exit non-zero so the kubelet restarts this
|
||||
# container, which re-waits for Xvfb and relaunches x11vnc — the bridge self-heals
|
||||
# across browser-container restarts. (Same supervision pattern as the
|
||||
# android-emulator stack's entrypoint.)
|
||||
websockify --web=/usr/share/novnc 6080 localhost:5900 &
|
||||
|
||||
wait -n || true
|
||||
echo "novnc: a supervised process (x11vnc or websockify) exited; exiting so the kubelet restarts this container." >&2
|
||||
exit 1
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue