Context
-------
The cluster policy is "no CPU limits anywhere" — CFS throttling causes
more harm than good for bursty single-threaded workloads (Node.js,
Python). LimitRanges are already correct (defaultRequest.cpu only, no
default.cpu), but 22 pods still carried CPU limits injected by upstream
Helm chart defaults — CrowdSec (lapi + agents), descheduler,
kubernetes-dashboard (×4), nvidia gpu-operator.
Previous attempts were ad-hoc: patch each values.yaml, occasionally
missing things on chart upgrade. This replaces that with a declarative
Kyverno mutation at admission time.
This change
-----------
Adds a new ClusterPolicy `strip-cpu-limits` with two foreach rules:
strip-container-cpu-limit → containers[]
strip-initcontainer-cpu-limit → initContainers[]
Each rule uses `patchesJson6902` with an `op: remove` on
`resources/limits/cpu`. JSON6902 `remove` fails on missing paths, so
per-element preconditions gate the mutation — pods without CPU limits
pass through untouched. A top-level rule precondition short-circuits
using JMESPath filter (`[?resources.limits.cpu != null] | length(@) > 0`)
so the mutation is a no-op for the overwhelming majority of pods.
Admission-time only. No `mutateExistingOnPolicyUpdate`, no `background`.
Existing pods keep their CPU limits until they're restarted naturally
(Helm upgrade, node drain, rollout). We rely on churn, not forced
restarts, to avoid unnecessary thrash.
Memory limits are preserved — they prevent OOM, still useful.
Flow
----
admission request → match Pod + CREATE
→ top-level precondition: any container has limits.cpu?
no → skip (fast path)
yes → foreach container:
element.limits.cpu present?
no → skip element
yes → remove /spec/containers/N/resources/limits/cpu
→ same again for initContainers
→ mutated pod proceeds to API server
Verification
------------
kubectl run test-strip-cpu --overrides='{limits:{cpu:500m,memory:64Mi}}'
→ admitted pod.resources = {limits:{memory:64Mi}, requests:{cpu:50m,memory:32Mi}}
→ CPU limit stripped, memory preserved, requests untouched
kubectl rollout restart deploy/kubernetes-dashboard-metrics-scraper
→ new pod.resources = {limits:{memory:400Mi}, requests:{cpu:100m,memory:200Mi}}
→ cluster-wide count of pods with CPU limits: 22 → 21
Rollout
-------
Remaining 21 pods will drop their CPU limits on natural churn. No manual
restarts in this change — user may want to time a mass restart with a
maintenance window.
Closes: code-eaf
Closes: code-4bz
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>