calico: bump tigera-operator mem limit 256Mi -> 512Mi (OOM crashloop fix)
All checks were successful
ci/woodpecker/push/default Pipeline was successful

The operator OOM-crashlooped on 2026-06-23: it idles at ~246Mi with a ~266Mi
startup spike (re-listing resources to build informer caches), both at/over the
256Mi limit, so the first time the pod restarted it could never finish startup
(exit 137 OOMKilled, leader-elect, OOM, repeat). A latent landmine — the limit
was always too tight; it only bit once the pod restarted. Data plane was never
affected (calico-node 7/7, tigerastatus green throughout). 512Mi gives headroom
(now ~246Mi steady, verified stable 0 restarts). NOT caused by the ESO migration
(which never touched calico); cluster churn was at most the trigger that exposed
the tight limit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-23 12:46:28 +00:00
parent 68c240b8de
commit 98d2b89614

View file

@ -162,6 +162,11 @@ resource "helm_release" "tigera_operator" {
# are installed -> "ensure CRDs are installed first". Not needed here.
goldmane = { enabled = false }
whisker = { enabled = false }
resources = { limits = { memory = "256Mi" } }
# 512Mi (was 256Mi): the operator idles at ~38Mi but its STARTUP spike
# (re-listing resources to build informer caches) exceeded 256Mi and
# OOM-crashlooped on 2026-06-23 the first time the pod restarted (a latent
# landmine any restart would have triggered it). 512Mi covers the spike;
# data plane (calico-node) is unaffected by an operator restart.
resources = { limits = { memory = "512Mi" } }
})]
}