kyverno: GPU priority mutate uses add (was replace) — fixes silent skip

The Layer 5 ClusterPolicy inject-gpu-workload-priority used JSON6902
op=replace on /spec/priorityClassName. Incoming pods (e.g. frigate)
have no priorityClassName field at all — replace requires the path to
exist, so the patch fails with "doc is missing key: /spec/priorityClassName"
and the whole mutation chain aborts BEFORE Layer 4 (inject-priority-class-from-tier)
gets a chance to add the field.

Result: GPU pods never got priorityClassName set, sat at priority=0, and
could not preempt lower-tier pods on the GPU node. Observed today on
frigate post-node4-recovery — pod stayed Pending with "Preemption is
not helpful" while 3 pg-cluster pods (tier-1-cluster, priority 800000)
occupied node1's memory budget.

Fix: op=add for all three paths. add works whether or not the key is
present, so the policy is robust to the upstream pod shape.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-05-26 09:04:51 +00:00
parent 12b4f6f81a
commit bb9d8f1b38

View file

@ -925,19 +925,24 @@ resource "kubectl_manifest" "mutate_gpu_priority" {
]
}
mutate = {
# `op=add` (not replace) incoming pods often lack the
# `/spec/priorityClassName` key entirely; replace fails with
# "doc is missing key" and aborts the mutation chain BEFORE
# Layer 4 (tier injection) can fall back. add works whether
# the path exists or not. Verified 2026-05-26 on frigate.
patchesJson6902 = yamlencode([
{
op = "replace"
op = "add"
path = "/spec/priorityClassName"
value = "gpu-workload"
},
{
op = "replace"
op = "add"
path = "/spec/priority"
value = 1200000
},
{
op = "replace"
op = "add"
path = "/spec/preemptionPolicy"
value = "PreemptLowerPriority"
}