kyverno: strip resources.limits.cpu cluster-wide via ClusterPolicy

Context
-------
The cluster policy is "no CPU limits anywhere" — CFS throttling causes
more harm than good for bursty single-threaded workloads (Node.js,
Python). LimitRanges are already correct (defaultRequest.cpu only, no
default.cpu), but 22 pods still carried CPU limits injected by upstream
Helm chart defaults — CrowdSec (lapi + agents), descheduler,
kubernetes-dashboard (×4), nvidia gpu-operator.

Previous attempts were ad-hoc: patch each values.yaml, occasionally
missing things on chart upgrade. This replaces that with a declarative
Kyverno mutation at admission time.

This change
-----------
Adds a new ClusterPolicy `strip-cpu-limits` with two foreach rules:

  strip-container-cpu-limit      → containers[]
  strip-initcontainer-cpu-limit  → initContainers[]

Each rule uses `patchesJson6902` with an `op: remove` on
`resources/limits/cpu`. JSON6902 `remove` fails on missing paths, so
per-element preconditions gate the mutation — pods without CPU limits
pass through untouched. A top-level rule precondition short-circuits
using JMESPath filter (`[?resources.limits.cpu != null] | length(@) > 0`)
so the mutation is a no-op for the overwhelming majority of pods.

Admission-time only. No `mutateExistingOnPolicyUpdate`, no `background`.
Existing pods keep their CPU limits until they're restarted naturally
(Helm upgrade, node drain, rollout). We rely on churn, not forced
restarts, to avoid unnecessary thrash.

Memory limits are preserved — they prevent OOM, still useful.

Flow
----

    admission request → match Pod + CREATE
                     → top-level precondition: any container has limits.cpu?
                           no  → skip (fast path)
                           yes → foreach container:
                                   element.limits.cpu present?
                                       no  → skip element
                                       yes → remove /spec/containers/N/resources/limits/cpu
                     → same again for initContainers
                     → mutated pod proceeds to API server

Verification
------------
  kubectl run test-strip-cpu --overrides='{limits:{cpu:500m,memory:64Mi}}'
    → admitted pod.resources = {limits:{memory:64Mi}, requests:{cpu:50m,memory:32Mi}}
    → CPU limit stripped, memory preserved, requests untouched

  kubectl rollout restart deploy/kubernetes-dashboard-metrics-scraper
    → new pod.resources = {limits:{memory:400Mi}, requests:{cpu:100m,memory:200Mi}}
    → cluster-wide count of pods with CPU limits: 22 → 21

Rollout
-------
Remaining 21 pods will drop their CPU limits on natural churn. No manual
restarts in this change — user may want to time a mass restart with a
maintenance window.

Closes: code-eaf
Closes: code-4bz

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-04-18 11:34:39 +00:00
parent 903fc8377f
commit 1de2ee307f

View file

@ -1025,3 +1025,131 @@ resource "kubernetes_manifest" "cleanup_failed_pods" {
}
}
}
# -----------------------------------------------------------------------------
# Strip CPU Limits (Kyverno Mutate)
# -----------------------------------------------------------------------------
# Removes resources.limits.cpu from every container and initContainer at pod
# admission. Memory limits are preserved. Cluster policy: CFS throttling causes
# more harm than good for bursty single-threaded workloads (Node.js, Python
# apps). Upstream Helm charts (CrowdSec, descheduler, kubernetes-dashboard,
# nvidia gpu-operator) still ship CPU limits this strips them declaratively
# so we don't have to fork values.yaml per chart.
#
# Scope: admission-time only. Existing pods keep their limits until restarted
# naturally (Helm upgrade, node drain, rollout). No mutateExistingOnPolicyUpdate.
#
# JSON6902 remove op fails on missing paths per-element precondition gates
# the mutation so pods without CPU limits pass through untouched.
resource "kubernetes_manifest" "mutate_strip_cpu_limits" {
manifest = {
apiVersion = "kyverno.io/v1"
kind = "ClusterPolicy"
metadata = {
name = "strip-cpu-limits"
annotations = {
"policies.kyverno.io/title" = "Strip CPU Limits"
"policies.kyverno.io/description" = join("", [
"Removes resources.limits.cpu from every container and initContainer ",
"at pod admission. Memory limits are preserved. Cluster policy: CFS ",
"throttling causes more harm than good for bursty single-threaded ",
"workloads (Node.js, Python apps).",
])
}
}
spec = {
background = false
rules = [
{
name = "strip-container-cpu-limit"
match = {
any = [
{
resources = {
kinds = ["Pod"]
operations = ["CREATE"]
}
}
]
}
preconditions = {
all = [
{
key = "{{ request.object.spec.containers[?resources.limits.cpu != null] | length(@) }}"
operator = "GreaterThan"
value = 0
}
]
}
mutate = {
foreach = [
{
list = "request.object.spec.containers"
preconditions = {
all = [
{
key = "{{ element.resources.limits.cpu || '' }}"
operator = "NotEquals"
value = ""
}
]
}
patchesJson6902 = yamlencode([
{
op = "remove"
path = "/spec/containers/{{ elementIndex }}/resources/limits/cpu"
}
])
}
]
}
},
{
name = "strip-initcontainer-cpu-limit"
match = {
any = [
{
resources = {
kinds = ["Pod"]
operations = ["CREATE"]
}
}
]
}
preconditions = {
all = [
{
key = "{{ request.object.spec.initContainers[?resources.limits.cpu != null] || `[]` | length(@) }}"
operator = "GreaterThan"
value = 0
}
]
}
mutate = {
foreach = [
{
list = "request.object.spec.initContainers"
preconditions = {
all = [
{
key = "{{ element.resources.limits.cpu || '' }}"
operator = "NotEquals"
value = ""
}
]
}
patchesJson6902 = yamlencode([
{
op = "remove"
path = "/spec/initContainers/{{ elementIndex }}/resources/limits/cpu"
}
])
}
]
}
},
]
}
}
}