noVNC scaled correctly but the emulator's Qt window opened small (~411x914)
and floated inside the 1080x2280 Xvfb, so the user saw a tiny phone in a sea
of black. v8 bakes a background fitter (wmctrl+xdotool) that, after boot,
auto-OKs the one-shot nested-virtualization warning dialog, fills the phone
window to the display, and parks the control strip off the right edge —
re-running to catch window/dialog timing then maintaining every 30s. Applied
live to the running pod already; this makes it survive the next wake.
Viktor's screen rendered unscaled on a bare /vnc.html. The entrypoint
now writes /usr/share/novnc/defaults.json (resize=scale, autoconnect,
reconnect with 2s delay, shared) so every load behaves right without URL
params, and viewers self-heal across pod restarts/wakes. Already applied
live to the running pod; this makes it survive the next wake.
Viktor's noVNC sat at 'Connecting…' forever: the WebSocket traversed
Cloudflare/Authentik/websockify fine, but x11vnc never sent the RFB
banner — strace showed it sweeping the container's fd table with one
fcntl per fd, and containerd grants RLIMIT_NOFILE=2147483584 here, so
each connection effectively never completed. The entrypoint now sets
ulimit -n 65536 for everything it launches (verified live: banner
answers instantly under the capped limit); x11vnc also gets -nolookup
so client reverse-DNS can never stall handshakes.
Viktor's direction (2026-06-12): the emulator is dev-only, so it should
be on-demand, and it should use the T4 where applicable. (1) api36-v5
runs '-gpu host' on the GPU node (nodeSelector + time-slice + EGL libs;
automatic swiftshader fallback if GPU init dies) — screen-on rendering
moves off the CPU (~5 cores → expected 1-2). (2) The wake gate (stdlib
python, owns / on both hostnames) scales the deployment 0→1 on visit and
hands the browser to noVNC when ready; agents GET /wake + /status. The
idle-sleeper CronJob counts established adb/noVNC connections via
/proc/net/tcp (excluding the in-container loopback adb client) and scales
to zero after 4 idle checks (~1h). TF ignores replicas drift. VRAM cost
(~0.5-1GiB) is held only while awake, protecting llama-swap headroom.
Two final fixes from the live debugging session: (1) sdkmanager-latest
emulator 36.6.11 hangs before executing a single guest instruction in
this pod (KVM and TCG alike, every gpu mode, crash-reporting on or off)
while 36.1.9 boots Android in ~107s — the entrypoint now pins build
13823996 on the PVC; (2) the emulator already listens on 127.0.0.1:5555,
so socat's wildcard bind died with EADDRINUSE and its exit restarted the
pod right after a successful boot — socat now binds the pod IP only.
v2's marker fix proved the install completes, but avdmanager still saw
no system images: it IGNORES ANDROID_SDK_ROOT (and has no --sdk_root),
deriving the SDK root from its own toolsdir — /opt/android in our image,
while packages live on the PVC at /sdk. v3 seeds cmdline-tools into
/sdk/cmdline-tools/latest once and runs avdmanager from there, so it
resolves the PVC as the SDK root.
First boot crashed mid-SDK-install, and the dir-existence check then
skipped reinstall forever: avdmanager saw the partial tree and died with
'Valid system image paths are: null' (CrashLoopBackOff). v2 tracks
install completion with a marker file written only after sdkmanager
succeeds + package.xml exists, wipes partial system-image trees before
reinstalling, and retries sdkmanager 3x.
Viktor is setting up an Android app development pipeline (tripit is the
first app) and wants agents to natively test changes on Android before
shipping. This adds the testing environment: an API-36 Google emulator
under KVM as a privileged pod (namespace joins the Kyverno exclude list),
SDK/system-image/AVD on a proxmox-lvm PVC, adb on the shared MetalLB IP
10.0.20.200:5555 (LAN only), noVNC screen view at
android-emulator.viktorbarzin.lan. Image is built manually from the
stack's docker/ dir (rare rebuilds; off-infra-CI rule targets repeated
builds). First infra ADR records the trade-offs (devvm/VM/redroid/budtmo
rejected).