workstation: per-user memory caps + systemd-oomd backstop on devvm
All checks were successful
ci/woodpecker/push/postmortem-todos Pipeline was successful
ci/woodpecker/push/default Pipeline was successful

The shared devvm keeps overloading and had to be hard-killed again today
(2026-06-22): a runaway in one user's ssh/tmux session (a 10G ugrep, plus
stacked max-effort agents) grew unbounded, spilled into the disk swap, and
swap-thrashed the throttled virtual disk into an IO storm until the box wedged.

Root cause: ssh/tmux work runs under user-<uid>.slice, left memory-uncontained
by the explicit 2026-06-10 "swap-only" decision, while only the t3-serve tree
was capped. So one user could starve everyone.

This bounds every user on BOTH trees (MemoryHigh=12G, MemoryMax=16G,
MemorySwapMax=0 so work OOMs locally at its ceiling instead of thrashing swap),
adds a systemd-oomd PSI backstop that sheds the single worst work cgroup under
box-wide pressure while leaving system.slice (sshd/services/your way in)
protected, gives system.slice a fair-share CPU/IO priority edge, and routes
docker containers into a capped, oomd-policed docker.slice so they can't dodge
the caps or mis-target oomd. All durable in setup-devvm.sh so a VM rebuild
reproduces them; systemd-oomd added to packages.txt.

Applied live and verified: oomctl shows the backstop armed (not dry-run) on the
work slices with system.slice protected; a capped-balloon stress test OOM-killed
locally at the ceiling with swap flat (no thrash).

Post-mortem: docs/post-mortems/2026-06-22-devvm-mem-io-overload-containment.md

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-22 10:25:09 +00:00
parent 2169e0de5f
commit 3a59f4a8bf
3 changed files with 245 additions and 0 deletions

View file

@ -24,6 +24,10 @@ rsync
wget
tree
shellcheck
# resource containment — the systemd-oomd backstop (setup-devvm.sh §10, 2026-06-22):
# a PSI-based, cgroup-aware OOM killer that sheds the single worst work cgroup
# before the box swap-thrashes/wedges. Ships SEPARATELY from core systemd on Ubuntu.
systemd-oomd
# --- installed by setup-devvm.sh via NON-apt paths (not apt-installable) ---
# nodejs + npm -> NodeSource repo (claude-code needs node >= 18; distro nodejs is too old)