infra/scripts/workstation/packages.txt
Viktor Barzin de163aa6af
All checks were successful
ci/woodpecker/push/postmortem-todos Pipeline was successful
ci/woodpecker/push/default Pipeline was successful
workstation: switch devvm OOM backstop from systemd-oomd to earlyoom
The systemd-oomd backstop added in the previous commit is INERT on this box.
oomd's memory-pressure kill only acts on cgroups doing active reclaim (pgscan
rising); with MemorySwapMax=0 + anonymous agent memory there is nothing to
reclaim, so pgscan stays 0 and oomd never fires. Proven live: a cgroup held at
96-99% memory.pressure for >70s with pgscan=0 was never killed (oomctl + balloon).
The very swap=0 that kills the IO storm also neuters oomd.

Replace it with earlyoom, which watches free RAM (MemAvailable%) and is
swap-independent: SIGTERM the biggest task at 5%, SIGKILL at 3%, swap ignored
(-s 100). It --avoids sshd/systemd/dockerd/containerd/t3-dispatch/tmux (the
admin's way in always survives) and --prefers the agent/browser hogs. Verified
via --dryrun: fires on the RAM threshold and selects a chrome process, not a
protected daemon.

The per-cgroup caps (MemoryHigh=12G/MemoryMax=16G/MemorySwapMax=0 per user,
docker.slice 8G) are unchanged and remain the PRIMARY guard — earlyoom is the
aggregate net for the rare all-users-maxed case. systemd-oomd purged; its config
+ ManagedOOM drop-ins removed. Post-mortem updated with the finding.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 10:39:16 +00:00

46 lines
2.2 KiB
Text

# Declarative host toolset for the devvm Workstation (apt packages, one per line).
# Consumed by setup-devvm.sh: apt-get install -y $(grep -vE '^\s*(#|$)' packages.txt)
# Comments (#) and blank lines are ignored. Tools NOT in the standard apt repos
# are listed below as comments with their real install path (handled explicitly
# in setup-devvm.sh) so this manifest stays a safe argument to `apt-get install`.
git
zsh
tmux
ripgrep
fd-find
jq
curl
ca-certificates
python3
python3-yaml
python3-pip
podman
# build/runtime deps of setup-devvm.sh itself (added 2026-06-10 reproducibility audit):
golang-go # builds the t3-dispatch binary (setup-devvm.sh section 9)
unzip # extracts the kubelogin release zip (section 3; python3 zipfile is the fallback)
build-essential # cgo + npm native-module builds
# core workstation tools (were manually-installed, not captured in the manifest):
rsync
wget
tree
shellcheck
# resource containment — earlyoom backstop (setup-devvm.sh §10, 2026-06-22): a
# free-RAM-threshold OOM killer used INSTEAD of systemd-oomd, which is inert with
# swap=0 (its pressure-kill needs reclaim/pgscan that no-swap anon workloads never
# produce; verified live — 99% mem.pressure, pgscan=0, no kill). earlyoom watches
# MemAvailable% and is swap-independent.
earlyoom
# --- installed by setup-devvm.sh via NON-apt paths (not apt-installable) ---
# nodejs + npm -> NodeSource repo (claude-code needs node >= 18; distro nodejs is too old)
# gh (GitHub CLI) -> GitHub's own apt repo (cli.github.com), NOT in the default Ubuntu repos
# @anthropic-ai/claude-code -> npm install -g
# kubectl -> k8s apt repo OR pinned binary (already present on devvm)
# vault -> HashiCorp apt repo OR pinned binary (already present on devvm)
# kubelogin (kubectl oidc-login) -> `kubectl krew install oidc-login` or int128/kubelogin release.
# NOTE: the apt package literally named "kubelogin" is the AZURE
# tool, NOT the OIDC one we need -- do not apt-install it.
# Scraped by Prometheus job `devvm` (stacks/monitoring) — pressure/swap/load
# history for t3 drop attribution (docs/runbooks/t3-drop-attribution.md).
prometheus-node-exporter