infra/stacks/android-emulator
Viktor Barzin 3e82c64a76 docs: sync CI/CD docs to ADR-0002 final state (ghcr + Woodpecker deploy-only) [ci skip]
ADR-0002 is fully landed (issues #11-#32 closed): every owned image now
builds on GitHub Actions and pushes to ghcr.io/viktorbarzin/<name>, with
Woodpecker reduced to deploy-only. The Forgejo container registry is frozen
and emptied; there are no in-cluster image builds or CI test runs anywhere.
The docs still described the old hybrid topology (DockerHub builds,
Woodpecker-native owned-app builds, the per-pattern migration lists, the
tripit-only pilot framing), which would mislead future sessions and
incident response.

This brings the docs to the completed reality (closes #33):

- docs/architecture/ci-cd.md: full rewrite as the canonical CI/CD reference —
  the fleet GHA->ghcr->Woodpecker-deploy pattern, public/private ghcr package
  split, infra-owned image workflows (incl. infra-ci on ghcr), the frozen
  Forgejo registry, what Woodpecker still runs, and the #31 decommissions.
- .claude/CLAUDE.md: rewrite the "CI/CD Architecture" section to the
  fleet-wide final state; FIX the stale claim that claude-memory-mcp builds
  to DockerHub (it is GHA->ghcr); note owned images now live on ghcr and the
  Forgejo registry is frozen/break-glass near the image-registry bullet.
- .claude/reference/service-catalog.md: f1-stream is GHA->ghcr + Woodpecker
  deploy-only (was "Woodpecker-native build->deploy").
- stacks/{tuya-bridge,android-emulator}/variables.tf + stacks/terminal/main.tf:
  cosmetic description/comment updates (forgejo -> ghcr; terminal-lobby has no
  CI pipeline). Description/comment text only — no stack logic changed.

Historical records (docs/post-mortems/*, docs/plans/*) and ADR-0002 itself
are left untouched as point-in-time records.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 12:55:49 +00:00
..
docker android-emulator: api36-v8 — auto-fit emulator window to the display 2026-06-12 20:44:29 +00:00
gate.py android-emulator: gate reaches the kube API via env vars, not DNS 2026-06-12 19:32:34 +00:00
gate.tf android-emulator: GPU rendering on node1 + scale-to-zero wake gate 2026-06-12 07:52:50 +00:00
main.tf chrome-service-novnc + android-emulator images -> ghcr (ADR-0002 #29/#30) 2026-06-13 09:43:40 +00:00
README.md android-emulator: README — final measured profile; honest GL story 2026-06-12 20:11:55 +00:00
secrets android-emulator: new stack — shared in-cluster Android 16 testing instance 2026-06-11 19:51:57 +00:00
terragrunt.hcl android-emulator: non-merge apply trigger (GPU + wake gate) 2026-06-12 07:53:38 +00:00
variables.tf docs: sync CI/CD docs to ADR-0002 final state (ghcr + Woodpecker deploy-only) [ci skip] 2026-06-13 12:55:49 +00:00

android-emulator — shared in-cluster Android testing instance

Android 16 (API 36, google_apis/x86_64) emulator running under KVM in the cluster, so agents can natively test app/PWA changes before shipping (first tenant: tripit). Decision record: docs/adr/0001-android-emulator-in-cluster.md.

On-demand lifecycle (since 2026-06-12)

The emulator scales to zero when idle (no adb/VNC connections for ~1h, checked by the android-emulator-idle-sleeper CronJob) and wakes on visit: the wake gate owns / on both hostnames. Warm boot is ~90s.

  • Humans: open https://android-emulator.viktorbarzin.me — it wakes the emulator if needed, shows a self-refreshing boot page, then hands over to the noVNC screen.

  • Agents (before adb): wake + poll, then connect:

    curl -ks --resolve android-emulator.viktorbarzin.lan:443:10.0.20.203 https://android-emulator.viktorbarzin.lan/wake
    until curl -ks --resolve android-emulator.viktorbarzin.lan:443:10.0.20.203 https://android-emulator.viktorbarzin.lan/status | grep -q '"ready": 1'; do sleep 5; done
    adb connect 10.0.20.200:5555
    

Endpoints

What Where
adb adb connect 10.0.20.200:5555 (LAN only; adb is unauthenticated — never expose publicly)
Screen (noVNC) https://android-emulator.viktorbarzin.lan/vnc.html (LAN only)

Agent quickstart (from a devvm)

# one-time: user-local platform-tools
wget -qO /tmp/pt.zip https://dl.google.com/android/repository/platform-tools-latest-linux.zip
unzip -q /tmp/pt.zip -d ~/android-sdk   # → ~/android-sdk/platform-tools/adb

adb="$HOME/android-sdk/platform-tools/adb"
$adb connect 10.0.20.200:5555
$adb -s 10.0.20.200:5555 install app-debug.apk          # install an APK
$adb -s 10.0.20.200:5555 shell am start -a android.intent.action.VIEW -d https://tripit.viktorbarzin.me   # open a URL
$adb -s 10.0.20.200:5555 shell input tap 540 1200        # drive the UI
$adb -s 10.0.20.200:5555 exec-out screencap -p > /tmp/screen.png   # screenshot

The emulator is a single shared instance — adb shell pm list packages, uninstall your test app when done, and presence-claim (presence claim service:android-emulator) for long destructive sessions (wipes, system-image changes).

How it works

  • The container image (built from docker/) holds only JDK 17, cmdline-tools, emulator native libs, Xvfb/x11vnc/noVNC and socat — ~1GB.
  • The SDK proper (platform-tools, emulator, system image, AVD, snapshots) lives on the android-emulator-sdk PVC (proxmox-lvm); the entrypoint installs it idempotently. First boot downloads ~2.5GB (≈9GB unpacked on the PVC) and takes ~15 min (startup probe allows 30); subsequent restarts boot in ~12 min.
  • The emulator runs on the GPU node (k8s-node1) with a T4 time-slice (qemu holds ~100 MiB VRAM while awake; scale-to-zero keeps it transient). Guest GL is deliberately SOFTWARE (llvmpipe): rendering into Xvfb pins GL to the X stack, and true NVIDIA headless GL would need -no-window plus the emulator's own streaming instead of x11vnc — not worth it at the measured CPU numbers below.

Rebuilding the image (rare — tool/library bumps only)

cd stacks/android-emulator/docker
docker build -t forgejo.viktorbarzin.me/viktor/android-emulator:<new-tag> .
docker push forgejo.viktorbarzin.me/viktor/android-emulator:<new-tag>
# then bump var.image_tag default in variables.tf and land via CI

Built manually from a devvm on purpose: it changes rarely, and a one-off push doesn't warrant CI plumbing (the off-infra-CI rule targets repeated build IO).

Troubleshooting

  • Pod CrashLoops with FATAL: /dev/kvm not present → node lost the device or the privileged/Kyverno exclude regressed (android-emulator must be in security_policy_exclude_namespaces, stacks/kyverno).
  • Wedged Android (won't boot, storage full) → delete the PVC + pod: next boot re-downloads cleanly. Snapshots/AVD state are disposable by design.
  • Different API level: set API_LEVEL env on the deployment (entrypoint installs that system image on the same PVC) or recreate the AVD.

Resource profile (measured 2026-06-12, v6 on node1)

  • Asleep (scaled to zero): nothing — the gate (~10m CPU/13Mi) is the only standing cost.
  • Awake: settles to ~0.51.3 cores with a static screen (on or off), ~4.85.2 Gi memory (limit 8 Gi, requests 3 Gi), ~100 MiB T4 VRAM. Boot bursts 59 cores for the first few minutes (dex2oat etc.).
  • Disk: ~7 G of the 30 Gi PVC.
  • Etiquette still applies for long sessions with animated content: adb -s 10.0.20.200:5555 shell input keyevent KEYCODE_SLEEP when done.

Remote access

https://android-emulator.viktorbarzin.me (Cloudflare-proxied, Authentik-gated) serves the same noVNC screen for off-LAN use. adb stays LAN-only by design.