infra/stacks/kyverno
Viktor Barzin bb9d8f1b38 kyverno: GPU priority mutate uses add (was replace) — fixes silent skip
The Layer 5 ClusterPolicy inject-gpu-workload-priority used JSON6902
op=replace on /spec/priorityClassName. Incoming pods (e.g. frigate)
have no priorityClassName field at all — replace requires the path to
exist, so the patch fails with "doc is missing key: /spec/priorityClassName"
and the whole mutation chain aborts BEFORE Layer 4 (inject-priority-class-from-tier)
gets a chance to add the field.

Result: GPU pods never got priorityClassName set, sat at priority=0, and
could not preempt lower-tier pods on the GPU node. Observed today on
frigate post-node4-recovery — pod stayed Pending with "Preemption is
not helpful" while 3 pg-cluster pods (tier-1-cluster, priority 800000)
occupied node1's memory budget.

Fix: op=add for all three paths. add works whether or not the key is
present, so the policy is robust to the upstream pod shape.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 09:04:51 +00:00
..
modules/kyverno kyverno: GPU priority mutate uses add (was replace) — fixes silent skip 2026-05-26 09:04:51 +00:00
main.tf extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
secrets extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
terragrunt.hcl extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00