The systemd-oomd backstop added in the previous commit is INERT on this box. oomd's memory-pressure kill only acts on cgroups doing active reclaim (pgscan rising); with MemorySwapMax=0 + anonymous agent memory there is nothing to reclaim, so pgscan stays 0 and oomd never fires. Proven live: a cgroup held at 96-99% memory.pressure for >70s with pgscan=0 was never killed (oomctl + balloon). The very swap=0 that kills the IO storm also neuters oomd. Replace it with earlyoom, which watches free RAM (MemAvailable%) and is swap-independent: SIGTERM the biggest task at 5%, SIGKILL at 3%, swap ignored (-s 100). It --avoids sshd/systemd/dockerd/containerd/t3-dispatch/tmux (the admin's way in always survives) and --prefers the agent/browser hogs. Verified via --dryrun: fires on the RAM threshold and selects a chrome process, not a protected daemon. The per-cgroup caps (MemoryHigh=12G/MemoryMax=16G/MemorySwapMax=0 per user, docker.slice 8G) are unchanged and remain the PRIMARY guard — earlyoom is the aggregate net for the rare all-users-maxed case. systemd-oomd purged; its config + ManagedOOM drop-ins removed. Post-mortem updated with the finding. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
46 lines
2.2 KiB
Text
46 lines
2.2 KiB
Text
# Declarative host toolset for the devvm Workstation (apt packages, one per line).
|
|
# Consumed by setup-devvm.sh: apt-get install -y $(grep -vE '^\s*(#|$)' packages.txt)
|
|
# Comments (#) and blank lines are ignored. Tools NOT in the standard apt repos
|
|
# are listed below as comments with their real install path (handled explicitly
|
|
# in setup-devvm.sh) so this manifest stays a safe argument to `apt-get install`.
|
|
git
|
|
zsh
|
|
tmux
|
|
ripgrep
|
|
fd-find
|
|
jq
|
|
curl
|
|
ca-certificates
|
|
python3
|
|
python3-yaml
|
|
python3-pip
|
|
podman
|
|
# build/runtime deps of setup-devvm.sh itself (added 2026-06-10 reproducibility audit):
|
|
golang-go # builds the t3-dispatch binary (setup-devvm.sh section 9)
|
|
unzip # extracts the kubelogin release zip (section 3; python3 zipfile is the fallback)
|
|
build-essential # cgo + npm native-module builds
|
|
# core workstation tools (were manually-installed, not captured in the manifest):
|
|
rsync
|
|
wget
|
|
tree
|
|
shellcheck
|
|
# resource containment — earlyoom backstop (setup-devvm.sh §10, 2026-06-22): a
|
|
# free-RAM-threshold OOM killer used INSTEAD of systemd-oomd, which is inert with
|
|
# swap=0 (its pressure-kill needs reclaim/pgscan that no-swap anon workloads never
|
|
# produce; verified live — 99% mem.pressure, pgscan=0, no kill). earlyoom watches
|
|
# MemAvailable% and is swap-independent.
|
|
earlyoom
|
|
|
|
# --- installed by setup-devvm.sh via NON-apt paths (not apt-installable) ---
|
|
# nodejs + npm -> NodeSource repo (claude-code needs node >= 18; distro nodejs is too old)
|
|
# gh (GitHub CLI) -> GitHub's own apt repo (cli.github.com), NOT in the default Ubuntu repos
|
|
# @anthropic-ai/claude-code -> npm install -g
|
|
# kubectl -> k8s apt repo OR pinned binary (already present on devvm)
|
|
# vault -> HashiCorp apt repo OR pinned binary (already present on devvm)
|
|
# kubelogin (kubectl oidc-login) -> `kubectl krew install oidc-login` or int128/kubelogin release.
|
|
# NOTE: the apt package literally named "kubelogin" is the AZURE
|
|
# tool, NOT the OIDC one we need -- do not apt-install it.
|
|
|
|
# Scraped by Prometheus job `devvm` (stacks/monitoring) — pressure/swap/load
|
|
# history for t3 drop attribution (docs/runbooks/t3-drop-attribution.md).
|
|
prometheus-node-exporter
|