infra

Viktor Barzin 1de2ee307f kyverno: strip resources.limits.cpu cluster-wide via ClusterPolicy Context ------- The cluster policy is "no CPU limits anywhere" — CFS throttling causes more harm than good for bursty single-threaded workloads (Node.js, Python). LimitRanges are already correct (defaultRequest.cpu only, no default.cpu), but 22 pods still carried CPU limits injected by upstream Helm chart defaults — CrowdSec (lapi + agents), descheduler, kubernetes-dashboard (×4), nvidia gpu-operator. Previous attempts were ad-hoc: patch each values.yaml, occasionally missing things on chart upgrade. This replaces that with a declarative Kyverno mutation at admission time. This change ----------- Adds a new ClusterPolicy `strip-cpu-limits` with two foreach rules: strip-container-cpu-limit → containers[] strip-initcontainer-cpu-limit → initContainers[] Each rule uses `patchesJson6902` with an `op: remove` on `resources/limits/cpu`. JSON6902 `remove` fails on missing paths, so per-element preconditions gate the mutation — pods without CPU limits pass through untouched. A top-level rule precondition short-circuits using JMESPath filter (`[?resources.limits.cpu != null] \| length(@) > 0`) so the mutation is a no-op for the overwhelming majority of pods. Admission-time only. No `mutateExistingOnPolicyUpdate`, no `background`. Existing pods keep their CPU limits until they're restarted naturally (Helm upgrade, node drain, rollout). We rely on churn, not forced restarts, to avoid unnecessary thrash. Memory limits are preserved — they prevent OOM, still useful. Flow ---- admission request → match Pod + CREATE → top-level precondition: any container has limits.cpu? no → skip (fast path) yes → foreach container: element.limits.cpu present? no → skip element yes → remove /spec/containers/N/resources/limits/cpu → same again for initContainers → mutated pod proceeds to API server Verification ------------ kubectl run test-strip-cpu --overrides='{limits:{cpu:500m,memory:64Mi}}' → admitted pod.resources = {limits:{memory:64Mi}, requests:{cpu:50m,memory:32Mi}} → CPU limit stripped, memory preserved, requests untouched kubectl rollout restart deploy/kubernetes-dashboard-metrics-scraper → new pod.resources = {limits:{memory:400Mi}, requests:{cpu:100m,memory:200Mi}} → cluster-wide count of pods with CPU limits: 22 → 21 Rollout ------- Remaining 21 pods will drop their CPU limits on natural churn. No manual restarts in this change — user may want to time a mass restart with a maintenance window. Closes: code-eaf Closes: code-4bz Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-04-18 11:34:39 +00:00
..
modules/kyverno	kyverno: strip resources.limits.cpu cluster-wide via ClusterPolicy	2026-04-18 11:34:39 +00:00
main.tf	extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip]	2026-03-17 21:34:11 +00:00
secrets	extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip]	2026-03-17 21:34:11 +00:00
terragrunt.hcl	extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip]	2026-03-17 21:34:11 +00:00