The Layer 5 ClusterPolicy inject-gpu-workload-priority used JSON6902
op=replace on /spec/priorityClassName. Incoming pods (e.g. frigate)
have no priorityClassName field at all — replace requires the path to
exist, so the patch fails with "doc is missing key: /spec/priorityClassName"
and the whole mutation chain aborts BEFORE Layer 4 (inject-priority-class-from-tier)
gets a chance to add the field.
Result: GPU pods never got priorityClassName set, sat at priority=0, and
could not preempt lower-tier pods on the GPU node. Observed today on
frigate post-node4-recovery — pod stayed Pending with "Preemption is
not helpful" while 3 pg-cluster pods (tier-1-cluster, priority 800000)
occupied node1's memory budget.
Fix: op=add for all three paths. add works whether or not the key is
present, so the policy is robust to the upstream pod shape.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>