equalize memory req=lim across 70+ containers using Prometheus 7d max data

After node2 OOM incident, right-size memory across the cluster by setting requests=limits based on max_over_time(container_memory_working_set_bytes[7d]) with 1.3x headroom. Eliminates ~37Gi overcommit gap. Categories: - Safe equalization (50 containers): set req=lim where max7d well within target - Limit increases (8 containers): raise limits for services spiking above current - No Prometheus data (12 containers): conservatively set lim=req - Exception: nextcloud keeps req=256Mi/lim=8Gi due to Apache memory spikes Also increased dbaas namespace quota from 12Gi to 16Gi to accommodate mysql 4Gi limits across 3 replicas.
2026-03-14 21:46:49 +00:00 · 2026-03-14 21:46:49 +00:00 · 23019da8e5
commit 23019da8e5
parent eb0301b02b
39 changed files with 211 additions and 74 deletions
--- a/stacks/platform/modules/authentik/values.yaml
+++ b/stacks/platform/modules/authentik/values.yaml
@ -20,7 +20,7 @@ server:
  resources:
    requests:
      cpu: 100m
-      memory: 512Mi
+      memory: 1Gi
    limits:
      memory: 1Gi
  topologySpreadConstraints:
@ -48,7 +48,7 @@ worker:
  resources:
    requests:
      cpu: 100m
-      memory: 384Mi
+      memory: 1Gi
    limits:
      memory: 1Gi
  topologySpreadConstraints: