infra/scripts/workstation/packages.txt

45 lines
2.1 KiB
Text
Raw Normal View History

# Declarative host toolset for the devvm Workstation (apt packages, one per line).
# Consumed by setup-devvm.sh: apt-get install -y $(grep -vE '^\s*(#|$)' packages.txt)
# Comments (#) and blank lines are ignored. Tools NOT in the standard apt repos
# are listed below as comments with their real install path (handled explicitly
# in setup-devvm.sh) so this manifest stays a safe argument to `apt-get install`.
git
zsh
tmux
ripgrep
fd-find
jq
curl
ca-certificates
python3
python3-yaml
python3-pip
podman
# build/runtime deps of setup-devvm.sh itself (added 2026-06-10 reproducibility audit):
golang-go # builds the t3-dispatch binary (setup-devvm.sh section 9)
unzip # extracts the kubelogin release zip (section 3; python3 zipfile is the fallback)
build-essential # cgo + npm native-module builds
# core workstation tools (were manually-installed, not captured in the manifest):
rsync
wget
tree
shellcheck
workstation: per-user memory caps + systemd-oomd backstop on devvm The shared devvm keeps overloading and had to be hard-killed again today (2026-06-22): a runaway in one user's ssh/tmux session (a 10G ugrep, plus stacked max-effort agents) grew unbounded, spilled into the disk swap, and swap-thrashed the throttled virtual disk into an IO storm until the box wedged. Root cause: ssh/tmux work runs under user-<uid>.slice, left memory-uncontained by the explicit 2026-06-10 "swap-only" decision, while only the t3-serve tree was capped. So one user could starve everyone. This bounds every user on BOTH trees (MemoryHigh=12G, MemoryMax=16G, MemorySwapMax=0 so work OOMs locally at its ceiling instead of thrashing swap), adds a systemd-oomd PSI backstop that sheds the single worst work cgroup under box-wide pressure while leaving system.slice (sshd/services/your way in) protected, gives system.slice a fair-share CPU/IO priority edge, and routes docker containers into a capped, oomd-policed docker.slice so they can't dodge the caps or mis-target oomd. All durable in setup-devvm.sh so a VM rebuild reproduces them; systemd-oomd added to packages.txt. Applied live and verified: oomctl shows the backstop armed (not dry-run) on the work slices with system.slice protected; a capped-balloon stress test OOM-killed locally at the ceiling with swap flat (no thrash). Post-mortem: docs/post-mortems/2026-06-22-devvm-mem-io-overload-containment.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 10:25:09 +00:00
# resource containment — the systemd-oomd backstop (setup-devvm.sh §10, 2026-06-22):
# a PSI-based, cgroup-aware OOM killer that sheds the single worst work cgroup
# before the box swap-thrashes/wedges. Ships SEPARATELY from core systemd on Ubuntu.
systemd-oomd
# --- installed by setup-devvm.sh via NON-apt paths (not apt-installable) ---
# nodejs + npm -> NodeSource repo (claude-code needs node >= 18; distro nodejs is too old)
# gh (GitHub CLI) -> GitHub's own apt repo (cli.github.com), NOT in the default Ubuntu repos
# @anthropic-ai/claude-code -> npm install -g
# kubectl -> k8s apt repo OR pinned binary (already present on devvm)
# vault -> HashiCorp apt repo OR pinned binary (already present on devvm)
# kubelogin (kubectl oidc-login) -> `kubectl krew install oidc-login` or int128/kubelogin release.
# NOTE: the apt package literally named "kubelogin" is the AZURE
# tool, NOT the OIDC one we need -- do not apt-install it.
# Scraped by Prometheus job `devvm` (stacks/monitoring) — pressure/swap/load
# history for t3 drop attribution (docs/runbooks/t3-drop-attribution.md).
prometheus-node-exporter