From dd2a8e640f74b7aa29a91b926a37ae983a8d4218 Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Thu, 4 Jun 2026 11:18:06 +0000 Subject: [PATCH] monitoring: right-size loki memory request 3Gi->1Gi (quota 89%->79%) monitoring-quota requests.memory sat at 89% (18.2/20Gi), tripping the ResourceQuota>80% WARN. Root cause was over-provisioned requests, not real usage: loki requested 3Gi but its VPA upperBound is 364Mi and actual ~315Mi. prometheus's 4Gi is legitimately required (2Gi tmpfs WAL shares the cgroup; OOMs at 3Gi during WAL replay) so it stays; grafana's main container is already 512Mi. Trimmed loki to 1Gi request (~3x its observed ceiling; 4Gi Burstable limit preserves query-spike headroom) -> quota 78.8%, clears the WARN. NOTE: alloy DaemonSet (562Mi/node) grows with node count, so revisit (bump the 20Gi quota) as the cluster expands. --- stacks/monitoring/modules/monitoring/loki.yaml | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/stacks/monitoring/modules/monitoring/loki.yaml b/stacks/monitoring/modules/monitoring/loki.yaml index a242affc..d44c8d82 100644 --- a/stacks/monitoring/modules/monitoring/loki.yaml +++ b/stacks/monitoring/modules/monitoring/loki.yaml @@ -70,7 +70,13 @@ singleBinary: resources: requests: cpu: 250m - memory: 3Gi + # Right-sized 2026-06-04 (3Gi->1Gi): VPA upperBound 364Mi, actual ~315Mi. + # 1Gi request is ~3x the observed ceiling; the 4Gi limit (Burstable) + # keeps headroom for query spikes. Frees 2Gi of monitoring-quota + # requests.memory, taking it 89%->~79% (under the >80% WARN). NOTE: the + # alloy DaemonSet (562Mi/node) grows with node count, so this can creep + # back over 80% as the cluster expands — bump the quota then. + memory: 1Gi limits: memory: 4Gi