Each build pod has 8-10 containers inheriting 1 CPU / 2Gi limits from
LimitRange defaults. With 4+ concurrent builds the old quota (48 CPU /
96Gi / 30 pods) was exhausted, blocking new builds. Increase to 64 CPU /
128Gi / 60 pods to safely support 5-6 concurrent builds.
Added `tier = var.tier` to kubernetes_namespace labels in ~73 service
modules. This enables Kyverno to generate LimitRange defaults,
ResourceQuotas, and PriorityClass injection for all namespaces.
Previously only 11 namespaces had tier labels; now all 80 active
namespaces are labeled. All pods restarted in rolling waves to pick
up the new policies.
Add nvidia.com/gpu toleration to all GPU workloads (frigate, ollama)
to support NoSchedule taint on GPU nodes. Update nvidia operator
helm values with daemonset tolerations. Enhance GPU pod memory
exporter with Kubernetes API integration to resolve container IDs
to pod names/namespaces, adding RBAC resources for API access.