workstation: per-user memory caps + systemd-oomd backstop on devvm
The shared devvm keeps overloading and had to be hard-killed again today (2026-06-22): a runaway in one user's ssh/tmux session (a 10G ugrep, plus stacked max-effort agents) grew unbounded, spilled into the disk swap, and swap-thrashed the throttled virtual disk into an IO storm until the box wedged. Root cause: ssh/tmux work runs under user-<uid>.slice, left memory-uncontained by the explicit 2026-06-10 "swap-only" decision, while only the t3-serve tree was capped. So one user could starve everyone. This bounds every user on BOTH trees (MemoryHigh=12G, MemoryMax=16G, MemorySwapMax=0 so work OOMs locally at its ceiling instead of thrashing swap), adds a systemd-oomd PSI backstop that sheds the single worst work cgroup under box-wide pressure while leaving system.slice (sshd/services/your way in) protected, gives system.slice a fair-share CPU/IO priority edge, and routes docker containers into a capped, oomd-policed docker.slice so they can't dodge the caps or mis-target oomd. All durable in setup-devvm.sh so a VM rebuild reproduces them; systemd-oomd added to packages.txt. Applied live and verified: oomctl shows the backstop armed (not dry-run) on the work slices with system.slice protected; a capped-balloon stress test OOM-killed locally at the ceiling with swap flat (no thrash). Post-mortem: docs/post-mortems/2026-06-22-devvm-mem-io-overload-containment.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
2169e0de5f
commit
3a59f4a8bf
3 changed files with 245 additions and 0 deletions
|
|
@ -24,6 +24,10 @@ rsync
|
|||
wget
|
||||
tree
|
||||
shellcheck
|
||||
# resource containment — the systemd-oomd backstop (setup-devvm.sh §10, 2026-06-22):
|
||||
# a PSI-based, cgroup-aware OOM killer that sheds the single worst work cgroup
|
||||
# before the box swap-thrashes/wedges. Ships SEPARATELY from core systemd on Ubuntu.
|
||||
systemd-oomd
|
||||
|
||||
# --- installed by setup-devvm.sh via NON-apt paths (not apt-installable) ---
|
||||
# nodejs + npm -> NodeSource repo (claude-code needs node >= 18; distro nodejs is too old)
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue