infra

Author	SHA1	Message	Date
Viktor Barzin	e5291f97c8	android-emulator: api36-v8 — auto-fit emulator window to the display noVNC scaled correctly but the emulator's Qt window opened small (~411x914) and floated inside the 1080x2280 Xvfb, so the user saw a tiny phone in a sea of black. v8 bakes a background fitter (wmctrl+xdotool) that, after boot, auto-OKs the one-shot nested-virtualization warning dialog, fills the phone window to the display, and parks the control strip off the right edge — re-running to catch window/dialog timing then maintaining every 30s. Applied live to the running pod already; this makes it survive the next wake.	2026-06-12 20:44:29 +00:00
Viktor Barzin	12fd1fcbc9	android-emulator: api36-v7 — noVNC defaults: scaled view, autoconnect, reconnect Some checks failed ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline was canceled Details Viktor's screen rendered unscaled on a bare /vnc.html. The entrypoint now writes /usr/share/novnc/defaults.json (resize=scale, autoconnect, reconnect with 2s delay, shared) so every load behaves right without URL params, and viewers self-heal across pod restarts/wakes. Already applied live to the running pod; this makes it survive the next wake.	2026-06-12 20:18:26 +00:00
Viktor Barzin	0491fc43f2	android-emulator: README — final measured profile; honest GL story Some checks failed ci/woodpecker/push/default Pipeline was canceled Details ci/woodpecker/push/build-cli Pipeline was canceled Details Trues the runbook up to reality: guest GL stays software (llvmpipe) under Xvfb by deliberate choice (NVIDIA headless GL would need a different streaming architecture), the GPU slice costs ~100MiB VRAM only while awake, and the awake steady-state is ~0.5-1.3 cores / ~5Gi with scale-to-zero covering idle.	2026-06-12 20:11:55 +00:00
Viktor Barzin	3802967290	android-emulator: api36-v6 — cap RLIMIT_NOFILE; x11vnc -nolookup All checks were successful ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline was successful Details Viktor's noVNC sat at 'Connecting…' forever: the WebSocket traversed Cloudflare/Authentik/websockify fine, but x11vnc never sent the RFB banner — strace showed it sweeping the container's fd table with one fcntl per fd, and containerd grants RLIMIT_NOFILE=2147483584 here, so each connection effectively never completed. The entrypoint now sets ulimit -n 65536 for everything it launches (verified live: banner answers instantly under the capped limit); x11vnc also gets -nolookup so client reverse-DNS can never stall handshakes.	2026-06-12 20:04:42 +00:00
Viktor Barzin	b2bd859a8e	android-emulator: NVIDIA_DRIVER_CAPABILITIES=all — graphics libs for -gpu host First GPU boot verified qemu attached to the T4, but the guest GL translator reported llvmpipe: the GPU operator injects only compute,utility by default, so the NVIDIA EGL/GL vendor libraries were absent and gfxstream silently fell back to software GL. The graphics capability completes the hardware rendering path.	2026-06-12 19:43:25 +00:00
Viktor Barzin	16adda2c48	android-emulator: gate reaches the kube API via env vars, not DNS All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details First real wake attempt 500'd: kubernetes.default.svc does not resolve from the gate's alpine pod (musl + injected dns_config ndots quirk), so every kube call failed with 'Name does not resolve'. Use the injected KUBERNETES_SERVICE_HOST/PORT env vars — the canonical in-cluster endpoint, no DNS dependency. ConfigMap checksum annotation rolls the gate automatically.	2026-06-12 19:32:34 +00:00
Viktor Barzin	b985686661	android-emulator: non-merge apply trigger (GPU + wake gate) All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details	2026-06-12 07:53:38 +00:00
Viktor Barzin	18ccd57b63	Merge forgejo/master into wizard/emu-gpu Some checks failed ci/woodpecker/push/build-cli Pipeline was canceled Details ci/woodpecker/push/default Pipeline was canceled Details	2026-06-12 07:53:12 +00:00
Viktor Barzin	f4dd515fd7	android-emulator: GPU rendering on node1 + scale-to-zero wake gate Viktor's direction (2026-06-12): the emulator is dev-only, so it should be on-demand, and it should use the T4 where applicable. (1) api36-v5 runs '-gpu host' on the GPU node (nodeSelector + time-slice + EGL libs; automatic swiftshader fallback if GPU init dies) — screen-on rendering moves off the CPU (~5 cores → expected 1-2). (2) The wake gate (stdlib python, owns / on both hostnames) scales the deployment 0→1 on visit and hands the browser to noVNC when ready; agents GET /wake + /status. The idle-sleeper CronJob counts established adb/noVNC connections via /proc/net/tcp (excluding the in-container loopback adb client) and scales to zero after 4 idle checks (~1h). TF ignores replicas drift. VRAM cost (~0.5-1GiB) is held only while awake, protecting llama-swap headroom.	2026-06-12 07:52:50 +00:00
Viktor Barzin	b598c61c61	android-emulator: scale to 0 — its CPU burn was starving etcd All checks were successful ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline was successful Details The cluster-health check found the control plane flapping: kube-scheduler and kube-controller-manager were crashlooping (220+ restarts) on lost leader-election leases, with "etcdserver: request timed out" in the logs. Root cause: the android-emulator pod's ~4.7-core swiftshader (software-GPU) CPU burn on node3, together with frigate on node1, saturated the single Proxmox host (load ~64) and starved etcd's disk/CPU on the k8s-master VM — so etcd timed out and the leader-election controllers died and restarted in a loop. The emulator is a shared test instance, not a 24/7 service, so scaling it to 0 is the right relief: spin it back to replicas=1 on-demand for a testing session. Confirmed recovery after scaling down: node3 CPU 83%->28%, PVE load 64->51, control-plane restarts frozen. Durable structural fix (etcd/critical VM disks off the shared sdc HDD; PVE CPU weighting) is tracked as code-oflt. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-12 07:31:46 +00:00
Viktor Barzin	db63cd7501	android-emulator+traefik: non-merge apply trigger for the rate-limit fix Some checks failed ci/woodpecker/push/build-cli Pipeline failed Details ci/woodpecker/push/default Pipeline failed Details Pipeline 102 applied nothing — the rate-limit commit entered master under a merge head and the changed-stack detector is blind to merge diffs. Plain commit touching both stacks so they apply.	2026-06-12 00:33:10 +00:00
Viktor Barzin	152dad0a40	android-emulator: dedicated rate-limit — noVNC's module storm tripped the shared 10/50 limiter Viktor's 'VNC stuck loading forever' (remote network): noVNC 1.3 is unbundled and fetches ~60 ES modules in parallel on page open; the shared Traefik rate-limit (average 10, burst 50) 429s the tail and noVNC's loader waits on the missing modules indefinitely (reproduced: 38x429 in a 90-request burst through the ingress). Adds a dedicated 50/300 android-emulator-rate-limit middleware (actualbudget/immich pattern) and opts both emulator ingresses out of the shared limiter.	2026-06-12 00:25:44 +00:00
Viktor Barzin	d818f7ed3b	android-emulator: README — measured resource profile + remote access + screen-off etiquette All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details	2026-06-12 00:10:03 +00:00
Viktor Barzin	43d2107760	android-emulator: public Authentik-gated ingress for the noVNC screen Some checks failed ci/woodpecker/push/build-cli Pipeline was canceled Details ci/woodpecker/push/default Pipeline was canceled Details Viktor wants the emulator screen reachable over the web: adds android-emulator.viktorbarzin.me (Cloudflare-proxied) behind Authentik forward-auth — same-origin WebSockets through forward-auth are proven by the terminal/ttyd stack. The LAN .lan view stays, and adb:5555 remains LAN-only since it is unauthenticated.	2026-06-12 00:07:49 +00:00
Viktor Barzin	02ed3062f6	android-emulator: non-merge apply trigger for v4 image rollout All checks were successful ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline was successful Details Pipeline 96 applied only tripit: the v4 bump (`577267cd`) entered master inside a merge whose first-parent diff hid stacks/android-emulator from the stack detector — same failure mode as the tts `798b0255` trigger. This plain commit touches the stack so the detector picks it up.	2026-06-11 23:48:16 +00:00
Viktor Barzin	577267cd97	android-emulator: api36-v4 — pin emulator 36.1.9; bind socat to pod IP Two final fixes from the live debugging session: (1) sdkmanager-latest emulator 36.6.11 hangs before executing a single guest instruction in this pod (KVM and TCG alike, every gpu mode, crash-reporting on or off) while 36.1.9 boots Android in ~107s — the entrypoint now pins build 13823996 on the PVC; (2) the emulator already listens on 127.0.0.1:5555, so socat's wildcard bind died with EADDRINUSE and its exit restarted the pod right after a successful boot — socat now binds the pod IP only.	2026-06-11 22:52:54 +00:00
Viktor Barzin	85dbec6108	android-emulator: api36-v3 — avdmanager must run from inside the SDK root Some checks failed ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline failed Details v2's marker fix proved the install completes, but avdmanager still saw no system images: it IGNORES ANDROID_SDK_ROOT (and has no --sdk_root), deriving the SDK root from its own toolsdir — /opt/android in our image, while packages live on the PVC at /sdk. v3 seeds cmdline-tools into /sdk/cmdline-tools/latest once and runs avdmanager from there, so it resolves the PVC as the SDK root.	2026-06-11 21:15:50 +00:00
Viktor Barzin	5e8a988858	android-emulator: api36-v2 — marker-file install idempotency + retries Some checks failed ci/woodpecker/push/k8s-portal Pipeline failed Details ci/woodpecker/push/postmortem-todos Pipeline was successful Details ci/woodpecker/push/pve-nfs-exports-sync Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/registry-config-sync Pipeline was successful Details ci/woodpecker/push/build-ci-image Pipeline was successful Details ci/woodpecker/push/default Pipeline failed Details First boot crashed mid-SDK-install, and the dir-existence check then skipped reinstall forever: avdmanager saw the partial tree and died with 'Valid system image paths are: null' (CrashLoopBackOff). v2 tracks install completion with a marker file written only after sdkmanager succeeds + package.xml exists, wipes partial system-image trees before reinstalling, and retries sdkmanager 3x.	2026-06-11 20:59:08 +00:00
Viktor Barzin	3fac45febc	android-emulator: drop applied import stanzas; deployment recreates fresh Some checks failed ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline was canceled Details The five imports from the last recovery commit are in state now (verified serial 4: everything except the deployment). The deployment kept falling out of state between runs, so instead of a third import round the broken 0-replica deployment object was deleted live (transient recovery step, presence-claimed) and this apply recreates it Terraform-owned with the quota-fitting 3Gi requests. Import stanzas must go because TF 1.5 errors on importing already-managed addresses.	2026-06-11 20:49:37 +00:00
Viktor Barzin	6b7efcd2d6	android-emulator: import the five resources still missing from state Some checks failed ci/woodpecker/push/default Pipeline failed Details ci/woodpecker/push/build-cli Pipeline was successful Details Pipeline 88 imported the namespace but its refresh dropped the PVC, both services, the ingress and the tls secret from state (PG-backend state races on this new stack's first applies), so the apply again died on 'already exists' conflicts. State now holds namespace+deployment; adopt the missing five with import blocks (TF 1.5 errors on importing already-managed addresses, so only the missing set is listed). Stanzas come out once applied.	2026-06-11 20:44:09 +00:00
Viktor Barzin	b948224008	android-emulator: import orphaned namespace into state (lock-race recovery) Some checks failed ci/woodpecker/push/default Pipeline failed Details ci/woodpecker/push/build-cli Pipeline was successful Details Pipeline 85 created the namespace but a Terraform pg-backend workspace-creation lock race (new stack schema initializing while other stacks applied concurrently) left it out of the recorded state — every later apply then died with 'namespaces android-emulator already exists'. Adopt it with an import block per the house recovery pattern; stanza gets removed once it has applied.	2026-06-11 20:38:46 +00:00
Viktor Barzin	99c19584f7	android-emulator: fit pod inside the tier-1 ResourceQuota (Burstable memory) Some checks failed ci/woodpecker/push/k8s-portal Pipeline failed Details ci/woodpecker/push/default Pipeline failed Details ci/woodpecker/push/postmortem-todos Pipeline was successful Details ci/woodpecker/push/pve-nfs-exports-sync Pipeline was successful Details ci/woodpecker/push/registry-config-sync Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/build-ci-image Pipeline was successful Details First deploy hit 'exceeded quota: tier-quota, requested requests.memory=8Gi, limited 4Gi' — the generated tier-1 quota caps memory REQUESTS at 4Gi but allows 32Gi of limits, so go Burstable (requests 3Gi, limits 8Gi) like tiers 3/4 do, instead of opting the namespace out via custom-quota.	2026-06-11 19:56:09 +00:00
Viktor Barzin	8b7c77c794	android-emulator: new stack — shared in-cluster Android 16 testing instance Viktor is setting up an Android app development pipeline (tripit is the first app) and wants agents to natively test changes on Android before shipping. This adds the testing environment: an API-36 Google emulator under KVM as a privileged pod (namespace joins the Kyverno exclude list), SDK/system-image/AVD on a proxmox-lvm PVC, adb on the shared MetalLB IP 10.0.20.200:5555 (LAN only), noVNC screen view at android-emulator.viktorbarzin.lan. Image is built manually from the stack's docker/ dir (rare rebuilds; off-infra-CI rule targets repeated builds). First infra ADR records the trade-offs (devvm/VM/redroid/budtmo rejected).	2026-06-11 19:51:57 +00:00

23 commits