infra

Viktor Barzin 74819d4061 feat(nvidia): GPU VRAM budget + watchdog to stop T4 overallocation The single time-sliced Tesla T4 has no per-tenant memory isolation, so its ~9 GPU workloads can collectively overallocate VRAM. On 2026-06-02 immich-ml's onnxruntime arena grew to 10.7 GB and silently starved llama-swap, breaking recruiter-responder for ~5h. Viktor asked for memory protection so we don't overallocate GPU memory, and chose to do it at the scheduling level (no device-plugin swap) after weighing HAMi and MPS. Make the scheduler VRAM-aware and add runtime teeth, all repo-native, time-slicing untouched: - Advertise a node extended resource viktorbarzin.me/gpumem (~14000 MiB) via a reconcile null_resource (immediate, apply-time) + hourly re-assert CronJob. - Each always-on GPU tenant declares a gpumem budget (immich-ml 3000, llama-swap 5000, frigate 2000, immich-server 1800, portal-stt 1500; sum 13300 <= advertised) so the scheduler refuses to co-schedule past the card (overflow -> Pending). - gpu-vram-watchdog Deployment recycles the biggest over-budget tenant ONLY when actual free VRAM < floor. Ships DRY_RUN=true (observe-then-enforce); flip to false after a few cycles look right. - Prometheus alerts GPUVRAMLow / GPUVRAMTelemetryDown / GPUVRAMWatchdogDown -- the 2026-06-02 post-mortem's never-built free-VRAM follow-up. - Docs: ADR-0016 (records why HAMi/MPS were rejected), CONTEXT.md GPU-sharing glossary; fix the stale "whole T4 / scale immich-ml to 0" llama-cpp comment. HITL GPU-node change: apply nvidia FIRST (advertise gpumem), verify the node shows the capacity, THEN the consumer stacks -- the cutover bounces GPU pods. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>		2026-06-30 07:57:40 +00:00
..
0001-android-emulator-in-cluster.md	android-emulator: new stack — shared in-cluster Android 16 testing instance	2026-06-11 19:51:57 +00:00
0002-all-image-builds-off-infra-gha-ghcr.md	docs: ADR-0002 — all owned image builds move off-infra to GHA + ghcr [ci skip]	2026-06-12 19:55:47 +00:00
0003-keep-forgejo-canonical-complete-mirror.md	plotting-book: pull image from private ghcr instead of public DockerHub	2026-06-27 15:32:19 +00:00
0004-homelab-unified-cli.md	homelab: v0.1 docs, distribution wiring, and version	2026-06-18 19:25:51 +00:00
0005-homelab-v01-scope.md	homelab: v0.1 docs, distribution wiring, and version	2026-06-18 19:25:51 +00:00
0006-homelab-work-and-tf.md	homelab: v0.1 docs, distribution wiring, and version	2026-06-18 19:25:51 +00:00
0007-homelab-k8s-verbs.md	homelab: v0.2.0 — docs + version for the k8s verb-group	2026-06-18 22:30:41 +00:00
0008-homelab-memory-verbs.md	homelab: add memory verb-group (v0.3.0) — direct claude-memory HTTP client	2026-06-19 05:56:25 +00:00
0009-homelab-ci-deploy-verbs.md	homelab: v0.4.0 — ci/deploy verbs (watch what you trigger)	2026-06-19 10:59:14 +00:00
0010-homelab-net-obs-verbs.md	homelab: v0.5.0 — net/dns/metrics/logs probes (endpoint resolution)	2026-06-19 11:27:31 +00:00
0011-homelab-usage-telemetry.md	docs(adr): add ADR-0015 (OS/sudo is the authorization boundary), supersede ADR-0011 privacy norm	2026-06-26 08:22:29 +00:00
0012-homelab-ha-verbs.md	homelab ha token: dedicated openclaw/ha-tokens secret + least-priv RBAC for emo	2026-06-21 10:45:32 +00:00
0013-homelab-browser-verbs.md	homelab v0.8.0: browser verbs for headful anti-bot web automation	2026-06-22 12:22:22 +00:00
0014-service-identity-and-east-west-observability.md	monitoring: consolidate all Slack alerting to #alerts, abandon #security	2026-06-26 13:29:44 +00:00
0015-os-is-the-authorization-boundary.md	docs(adr): add ADR-0015 (OS/sudo is the authorization boundary), supersede ADR-0011 privacy norm	2026-06-26 08:22:29 +00:00
0016-gpu-vram-extended-resource-budget.md	feat(nvidia): GPU VRAM budget + watchdog to stop T4 overallocation	2026-06-30 07:57:40 +00:00