kyverno: strip resources.limits.cpu cluster-wide via ClusterPolicy

Context ------- The cluster policy is "no CPU limits anywhere" — CFS throttling causes more harm than good for bursty single-threaded workloads (Node.js, Python). LimitRanges are already correct (defaultRequest.cpu only, no default.cpu), but 22 pods still carried CPU limits injected by upstream Helm chart defaults — CrowdSec (lapi + agents), descheduler, kubernetes-dashboard (×4), nvidia gpu-operator. Previous attempts were ad-hoc: patch each values.yaml, occasionally missing things on chart upgrade. This replaces that with a declarative Kyverno mutation at admission time. This change ----------- Adds a new ClusterPolicy `strip-cpu-limits` with two foreach rules: strip-container-cpu-limit → containers[] strip-initcontainer-cpu-limit → initContainers[] Each rule uses `patchesJson6902` with an `op: remove` on `resources/limits/cpu`. JSON6902 `remove` fails on missing paths, so per-element preconditions gate the mutation — pods without CPU limits pass through untouched. A top-level rule precondition short-circuits using JMESPath filter (`[?resources.limits.cpu != null] | length(@) > 0`) so the mutation is a no-op for the overwhelming majority of pods. Admission-time only. No `mutateExistingOnPolicyUpdate`, no `background`. Existing pods keep their CPU limits until they're restarted naturally (Helm upgrade, node drain, rollout). We rely on churn, not forced restarts, to avoid unnecessary thrash. Memory limits are preserved — they prevent OOM, still useful. Flow ---- admission request → match Pod + CREATE → top-level precondition: any container has limits.cpu? no → skip (fast path) yes → foreach container: element.limits.cpu present? no → skip element yes → remove /spec/containers/N/resources/limits/cpu → same again for initContainers → mutated pod proceeds to API server Verification ------------ kubectl run test-strip-cpu --overrides='{limits:{cpu:500m,memory:64Mi}}' → admitted pod.resources = {limits:{memory:64Mi}, requests:{cpu:50m,memory:32Mi}} → CPU limit stripped, memory preserved, requests untouched kubectl rollout restart deploy/kubernetes-dashboard-metrics-scraper → new pod.resources = {limits:{memory:400Mi}, requests:{cpu:100m,memory:200Mi}} → cluster-wide count of pods with CPU limits: 22 → 21 Rollout ------- Remaining 21 pods will drop their CPU limits on natural churn. No manual restarts in this change — user may want to time a mass restart with a maintenance window. Closes: code-eaf Closes: code-4bz Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:34:39 +00:00 · 2026-04-18 11:34:39 +00:00 · 1de2ee307f
commit 1de2ee307f
parent 903fc8377f
1 changed files with 128 additions and 0 deletions
--- a/stacks/kyverno/modules/kyverno/resource-governance.tf
+++ b/stacks/kyverno/modules/kyverno/resource-governance.tf
@ -1025,3 +1025,131 @@ resource "kubernetes_manifest" "cleanup_failed_pods" {
    }
  }
 }
+
+# -----------------------------------------------------------------------------
+# Strip CPU Limits (Kyverno Mutate)
+# -----------------------------------------------------------------------------
+# Removes resources.limits.cpu from every container and initContainer at pod
+# admission. Memory limits are preserved. Cluster policy: CFS throttling causes
+# more harm than good for bursty single-threaded workloads (Node.js, Python
+# apps). Upstream Helm charts (CrowdSec, descheduler, kubernetes-dashboard,
+# nvidia gpu-operator) still ship CPU limits — this strips them declaratively
+# so we don't have to fork values.yaml per chart.
+#
+# Scope: admission-time only. Existing pods keep their limits until restarted
+# naturally (Helm upgrade, node drain, rollout). No mutateExistingOnPolicyUpdate.
+#
+# JSON6902 remove op fails on missing paths — per-element precondition gates
+# the mutation so pods without CPU limits pass through untouched.
+
+resource "kubernetes_manifest" "mutate_strip_cpu_limits" {
+  manifest = {
+    apiVersion = "kyverno.io/v1"
+    kind       = "ClusterPolicy"
+    metadata = {
+      name = "strip-cpu-limits"
+      annotations = {
+        "policies.kyverno.io/title" = "Strip CPU Limits"
+        "policies.kyverno.io/description" = join("", [
+          "Removes resources.limits.cpu from every container and initContainer ",
+          "at pod admission. Memory limits are preserved. Cluster policy: CFS ",
+          "throttling causes more harm than good for bursty single-threaded ",
+          "workloads (Node.js, Python apps).",
+        ])
+      }
+    }
+    spec = {
+      background = false
+      rules = [
+        {
+          name = "strip-container-cpu-limit"
+          match = {
+            any = [
+              {
+                resources = {
+                  kinds      = ["Pod"]
+                  operations = ["CREATE"]
+                }
+              }
+            ]
+          }
+          preconditions = {
+            all = [
+              {
+                key      = "{{ request.object.spec.containers[?resources.limits.cpu != null] | length(@) }}"
+                operator = "GreaterThan"
+                value    = 0
+              }
+            ]
+          }
+          mutate = {
+            foreach = [
+              {
+                list = "request.object.spec.containers"
+                preconditions = {
+                  all = [
+                    {
+                      key      = "{{ element.resources.limits.cpu || '' }}"
+                      operator = "NotEquals"
+                      value    = ""
+                    }
+                  ]
+                }
+                patchesJson6902 = yamlencode([
+                  {
+                    op   = "remove"
+                    path = "/spec/containers/{{ elementIndex }}/resources/limits/cpu"
+                  }
+                ])
+              }
+            ]
+          }
+        },
+        {
+          name = "strip-initcontainer-cpu-limit"
+          match = {
+            any = [
+              {
+                resources = {
+                  kinds      = ["Pod"]
+                  operations = ["CREATE"]
+                }
+              }
+            ]
+          }
+          preconditions = {
+            all = [
+              {
+                key      = "{{ request.object.spec.initContainers[?resources.limits.cpu != null] || `[]` | length(@) }}"
+                operator = "GreaterThan"
+                value    = 0
+              }
+            ]
+          }
+          mutate = {
+            foreach = [
+              {
+                list = "request.object.spec.initContainers"
+                preconditions = {
+                  all = [
+                    {
+                      key      = "{{ element.resources.limits.cpu || '' }}"
+                      operator = "NotEquals"
+                      value    = ""
+                    }
+                  ]
+                }
+                patchesJson6902 = yamlencode([
+                  {
+                    op   = "remove"
+                    path = "/spec/initContainers/{{ elementIndex }}/resources/limits/cpu"
+                  }
+                ])
+              }
+            ]
+          }
+        },
+      ]
+    }
+  }
+}