The shared devvm keeps overloading and had to be hard-killed again today (2026-06-22): a runaway in one user's ssh/tmux session (a 10G ugrep, plus stacked max-effort agents) grew unbounded, spilled into the disk swap, and swap-thrashed the throttled virtual disk into an IO storm until the box wedged. Root cause: ssh/tmux work runs under user-<uid>.slice, left memory-uncontained by the explicit 2026-06-10 "swap-only" decision, while only the t3-serve tree was capped. So one user could starve everyone. This bounds every user on BOTH trees (MemoryHigh=12G, MemoryMax=16G, MemorySwapMax=0 so work OOMs locally at its ceiling instead of thrashing swap), adds a systemd-oomd PSI backstop that sheds the single worst work cgroup under box-wide pressure while leaving system.slice (sshd/services/your way in) protected, gives system.slice a fair-share CPU/IO priority edge, and routes docker containers into a capped, oomd-policed docker.slice so they can't dodge the caps or mis-target oomd. All durable in setup-devvm.sh so a VM rebuild reproduces them; systemd-oomd added to packages.txt. Applied live and verified: oomctl shows the backstop armed (not dry-run) on the work slices with system.slice protected; a capped-balloon stress test OOM-killed locally at the ceiling with swap flat (no thrash). Post-mortem: docs/post-mortems/2026-06-22-devvm-mem-io-overload-containment.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
44 lines
2.1 KiB
Text
44 lines
2.1 KiB
Text
# Declarative host toolset for the devvm Workstation (apt packages, one per line).
|
|
# Consumed by setup-devvm.sh: apt-get install -y $(grep -vE '^\s*(#|$)' packages.txt)
|
|
# Comments (#) and blank lines are ignored. Tools NOT in the standard apt repos
|
|
# are listed below as comments with their real install path (handled explicitly
|
|
# in setup-devvm.sh) so this manifest stays a safe argument to `apt-get install`.
|
|
git
|
|
zsh
|
|
tmux
|
|
ripgrep
|
|
fd-find
|
|
jq
|
|
curl
|
|
ca-certificates
|
|
python3
|
|
python3-yaml
|
|
python3-pip
|
|
podman
|
|
# build/runtime deps of setup-devvm.sh itself (added 2026-06-10 reproducibility audit):
|
|
golang-go # builds the t3-dispatch binary (setup-devvm.sh section 9)
|
|
unzip # extracts the kubelogin release zip (section 3; python3 zipfile is the fallback)
|
|
build-essential # cgo + npm native-module builds
|
|
# core workstation tools (were manually-installed, not captured in the manifest):
|
|
rsync
|
|
wget
|
|
tree
|
|
shellcheck
|
|
# resource containment — the systemd-oomd backstop (setup-devvm.sh §10, 2026-06-22):
|
|
# a PSI-based, cgroup-aware OOM killer that sheds the single worst work cgroup
|
|
# before the box swap-thrashes/wedges. Ships SEPARATELY from core systemd on Ubuntu.
|
|
systemd-oomd
|
|
|
|
# --- installed by setup-devvm.sh via NON-apt paths (not apt-installable) ---
|
|
# nodejs + npm -> NodeSource repo (claude-code needs node >= 18; distro nodejs is too old)
|
|
# gh (GitHub CLI) -> GitHub's own apt repo (cli.github.com), NOT in the default Ubuntu repos
|
|
# @anthropic-ai/claude-code -> npm install -g
|
|
# kubectl -> k8s apt repo OR pinned binary (already present on devvm)
|
|
# vault -> HashiCorp apt repo OR pinned binary (already present on devvm)
|
|
# kubelogin (kubectl oidc-login) -> `kubectl krew install oidc-login` or int128/kubelogin release.
|
|
# NOTE: the apt package literally named "kubelogin" is the AZURE
|
|
# tool, NOT the OIDC one we need -- do not apt-install it.
|
|
|
|
# Scraped by Prometheus job `devvm` (stacks/monitoring) — pressure/swap/load
|
|
# history for t3 drop attribution (docs/runbooks/t3-drop-attribution.md).
|
|
prometheus-node-exporter
|