kyverno: GPU priority mutate uses add (was replace) — fixes silent skip
The Layer 5 ClusterPolicy inject-gpu-workload-priority used JSON6902 op=replace on /spec/priorityClassName. Incoming pods (e.g. frigate) have no priorityClassName field at all — replace requires the path to exist, so the patch fails with "doc is missing key: /spec/priorityClassName" and the whole mutation chain aborts BEFORE Layer 4 (inject-priority-class-from-tier) gets a chance to add the field. Result: GPU pods never got priorityClassName set, sat at priority=0, and could not preempt lower-tier pods on the GPU node. Observed today on frigate post-node4-recovery — pod stayed Pending with "Preemption is not helpful" while 3 pg-cluster pods (tier-1-cluster, priority 800000) occupied node1's memory budget. Fix: op=add for all three paths. add works whether or not the key is present, so the policy is robust to the upstream pod shape. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
12b4f6f81a
commit
bb9d8f1b38
1 changed files with 8 additions and 3 deletions
|
|
@ -925,19 +925,24 @@ resource "kubectl_manifest" "mutate_gpu_priority" {
|
|||
]
|
||||
}
|
||||
mutate = {
|
||||
# `op=add` (not replace) — incoming pods often lack the
|
||||
# `/spec/priorityClassName` key entirely; replace fails with
|
||||
# "doc is missing key" and aborts the mutation chain BEFORE
|
||||
# Layer 4 (tier injection) can fall back. add works whether
|
||||
# the path exists or not. Verified 2026-05-26 on frigate.
|
||||
patchesJson6902 = yamlencode([
|
||||
{
|
||||
op = "replace"
|
||||
op = "add"
|
||||
path = "/spec/priorityClassName"
|
||||
value = "gpu-workload"
|
||||
},
|
||||
{
|
||||
op = "replace"
|
||||
op = "add"
|
||||
path = "/spec/priority"
|
||||
value = 1200000
|
||||
},
|
||||
{
|
||||
op = "replace"
|
||||
op = "add"
|
||||
path = "/spec/preemptionPolicy"
|
||||
value = "PreemptLowerPriority"
|
||||
}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue