monitoring: right-size loki memory request 3Gi->1Gi (quota 89%->79%)

monitoring-quota requests.memory sat at 89% (18.2/20Gi), tripping the
ResourceQuota>80% WARN. Root cause was over-provisioned requests, not real
usage: loki requested 3Gi but its VPA upperBound is 364Mi and actual ~315Mi.
prometheus's 4Gi is legitimately required (2Gi tmpfs WAL shares the cgroup;
OOMs at 3Gi during WAL replay) so it stays; grafana's main container is
already 512Mi. Trimmed loki to 1Gi request (~3x its observed ceiling; 4Gi
Burstable limit preserves query-spike headroom) -> quota 78.8%, clears the
WARN. NOTE: alloy DaemonSet (562Mi/node) grows with node count, so revisit
(bump the 20Gi quota) as the cluster expands.
This commit is contained in:
Viktor Barzin 2026-06-04 11:18:06 +00:00
parent 90d7c11c16
commit dd2a8e640f

View file

@ -70,7 +70,13 @@ singleBinary:
resources:
requests:
cpu: 250m
memory: 3Gi
# Right-sized 2026-06-04 (3Gi->1Gi): VPA upperBound 364Mi, actual ~315Mi.
# 1Gi request is ~3x the observed ceiling; the 4Gi limit (Burstable)
# keeps headroom for query spikes. Frees 2Gi of monitoring-quota
# requests.memory, taking it 89%->~79% (under the >80% WARN). NOTE: the
# alloy DaemonSet (562Mi/node) grows with node count, so this can creep
# back over 80% as the cluster expands — bump the quota then.
memory: 1Gi
limits:
memory: 4Gi