From a24cf8c68963a2c328e730bae26b9d719145ab46 Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Sat, 18 Apr 2026 13:23:14 +0000 Subject: [PATCH] [docs] post-mortem: clarify the sizeLimit vs container memory limit gotcha MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Initial 2Gi sizeLimit didn't take effect because Kyverno's tier-defaults LimitRange in authentik ns applies a default container memory limit of 256Mi to pods with resources: {}. Writes to a memory-backed emptyDir count against the container's cgroup memory, so the container was OOM-killed (exit 137) at ~256 MiB even though the tmpfs sizeLimit said 2Gi. Confirmed with `dd if=/dev/zero of=/dev/shm/test bs=1M count=500`. Fix: also set `containers[0].resources.limits.memory: 2560Mi` via the same kubernetes_json_patches. Verified end-to-end — 1.5 GB file write succeeds, df -h /dev/shm reports 2.0G. Updates the post-mortem P1 row to capture this for future readers. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/post-mortems/2026-04-18-authentik-outpost-shm-full.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/post-mortems/2026-04-18-authentik-outpost-shm-full.md b/docs/post-mortems/2026-04-18-authentik-outpost-shm-full.md index 5cd22fd1..6ee8d870 100644 --- a/docs/post-mortems/2026-04-18-authentik-outpost-shm-full.md +++ b/docs/post-mortems/2026-04-18-authentik-outpost-shm-full.md @@ -111,7 +111,7 @@ Contributing distractions: |----------|--------|------|---------|--------| | P1 | Prometheus alerts on outpost `/dev/shm` fill (two thresholds) | Alert | Group `Authentik Outpost` added in `stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl`. `AuthentikOutpostMemoryHigh` (warning, working set > 1.5 GiB for 15m) + `AuthentikOutpostMemoryCritical` (critical, > 1.8 GiB for 5m) + `AuthentikOutpostRestarts` (warning, > 2 restarts in 30m). Applied 2026-04-18 13:16 UTC; loaded in Prometheus, state=inactive. | **DONE** | | P1 | Uptime-Kuma meta-monitor: "N+ external monitors down simultaneously" | Alert | Either a Prometheus rule over `uptime_kuma_monitor_status == 0` counts, or a dedicated external probe. Very strong signal of shared-infra failure. | TODO | -| P1 | Bump tmpfs `sizeLimit` from 512Mi → 2Gi | Config | Patched outpost `kubernetes_json_patches` via Authentik API. 2026-04-18 13:06 UTC. Gives ~8× growth headroom at current probe rate before needing reconsideration. | **DONE** | +| P1 | Bump tmpfs `sizeLimit` from 512Mi → 2Gi + set explicit container memory limit 2560Mi | Config | Patched outpost `kubernetes_json_patches` via Authentik API. 2026-04-18 13:06 UTC (sizeLimit), 13:22 UTC (container limit). **Gotcha**: `sizeLimit` alone is insufficient — writes to tmpfs count against container cgroup memory, and Kyverno's `tier-defaults` LimitRange sets a default `limits.memory: 256Mi` which OOM-kills the container before tmpfs fills. Fix is to also set `containers[0].resources.limits.memory` ≥ `sizeLimit + working_set_headroom`. Verified 1.5 GB file write succeeds on the configured pod; df reports 2.0 GB tmpfs. Gives ~8× growth headroom at current probe rate. | **DONE** | ### P2 — Codify the fix so it survives drift