infra/scripts/workstation/packages.txt
Viktor Barzin 3a59f4a8bf
All checks were successful
ci/woodpecker/push/postmortem-todos Pipeline was successful
ci/woodpecker/push/default Pipeline was successful
workstation: per-user memory caps + systemd-oomd backstop on devvm
The shared devvm keeps overloading and had to be hard-killed again today
(2026-06-22): a runaway in one user's ssh/tmux session (a 10G ugrep, plus
stacked max-effort agents) grew unbounded, spilled into the disk swap, and
swap-thrashed the throttled virtual disk into an IO storm until the box wedged.

Root cause: ssh/tmux work runs under user-<uid>.slice, left memory-uncontained
by the explicit 2026-06-10 "swap-only" decision, while only the t3-serve tree
was capped. So one user could starve everyone.

This bounds every user on BOTH trees (MemoryHigh=12G, MemoryMax=16G,
MemorySwapMax=0 so work OOMs locally at its ceiling instead of thrashing swap),
adds a systemd-oomd PSI backstop that sheds the single worst work cgroup under
box-wide pressure while leaving system.slice (sshd/services/your way in)
protected, gives system.slice a fair-share CPU/IO priority edge, and routes
docker containers into a capped, oomd-policed docker.slice so they can't dodge
the caps or mis-target oomd. All durable in setup-devvm.sh so a VM rebuild
reproduces them; systemd-oomd added to packages.txt.

Applied live and verified: oomctl shows the backstop armed (not dry-run) on the
work slices with system.slice protected; a capped-balloon stress test OOM-killed
locally at the ceiling with swap flat (no thrash).

Post-mortem: docs/post-mortems/2026-06-22-devvm-mem-io-overload-containment.md

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 10:25:09 +00:00

44 lines
2.1 KiB
Text

# Declarative host toolset for the devvm Workstation (apt packages, one per line).
# Consumed by setup-devvm.sh: apt-get install -y $(grep -vE '^\s*(#|$)' packages.txt)
# Comments (#) and blank lines are ignored. Tools NOT in the standard apt repos
# are listed below as comments with their real install path (handled explicitly
# in setup-devvm.sh) so this manifest stays a safe argument to `apt-get install`.
git
zsh
tmux
ripgrep
fd-find
jq
curl
ca-certificates
python3
python3-yaml
python3-pip
podman
# build/runtime deps of setup-devvm.sh itself (added 2026-06-10 reproducibility audit):
golang-go # builds the t3-dispatch binary (setup-devvm.sh section 9)
unzip # extracts the kubelogin release zip (section 3; python3 zipfile is the fallback)
build-essential # cgo + npm native-module builds
# core workstation tools (were manually-installed, not captured in the manifest):
rsync
wget
tree
shellcheck
# resource containment — the systemd-oomd backstop (setup-devvm.sh §10, 2026-06-22):
# a PSI-based, cgroup-aware OOM killer that sheds the single worst work cgroup
# before the box swap-thrashes/wedges. Ships SEPARATELY from core systemd on Ubuntu.
systemd-oomd
# --- installed by setup-devvm.sh via NON-apt paths (not apt-installable) ---
# nodejs + npm -> NodeSource repo (claude-code needs node >= 18; distro nodejs is too old)
# gh (GitHub CLI) -> GitHub's own apt repo (cli.github.com), NOT in the default Ubuntu repos
# @anthropic-ai/claude-code -> npm install -g
# kubectl -> k8s apt repo OR pinned binary (already present on devvm)
# vault -> HashiCorp apt repo OR pinned binary (already present on devvm)
# kubelogin (kubectl oidc-login) -> `kubectl krew install oidc-login` or int128/kubelogin release.
# NOTE: the apt package literally named "kubelogin" is the AZURE
# tool, NOT the OIDC one we need -- do not apt-install it.
# Scraped by Prometheus job `devvm` (stacks/monitoring) — pressure/swap/load
# history for t3 drop attribution (docs/runbooks/t3-drop-attribution.md).
prometheus-node-exporter