calico: bump tigera-operator mem limit 256Mi -> 512Mi (OOM crashloop fix)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
All checks were successful
ci/woodpecker/push/default Pipeline was successful
The operator OOM-crashlooped on 2026-06-23: it idles at ~246Mi with a ~266Mi startup spike (re-listing resources to build informer caches), both at/over the 256Mi limit, so the first time the pod restarted it could never finish startup (exit 137 OOMKilled, leader-elect, OOM, repeat). A latent landmine — the limit was always too tight; it only bit once the pod restarted. Data plane was never affected (calico-node 7/7, tigerastatus green throughout). 512Mi gives headroom (now ~246Mi steady, verified stable 0 restarts). NOT caused by the ESO migration (which never touched calico); cluster churn was at most the trigger that exposed the tight limit. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
68c240b8de
commit
98d2b89614
1 changed files with 6 additions and 1 deletions
|
|
@ -162,6 +162,11 @@ resource "helm_release" "tigera_operator" {
|
|||
# are installed -> "ensure CRDs are installed first". Not needed here.
|
||||
goldmane = { enabled = false }
|
||||
whisker = { enabled = false }
|
||||
resources = { limits = { memory = "256Mi" } }
|
||||
# 512Mi (was 256Mi): the operator idles at ~38Mi but its STARTUP spike
|
||||
# (re-listing resources to build informer caches) exceeded 256Mi and
|
||||
# OOM-crashlooped on 2026-06-23 the first time the pod restarted (a latent
|
||||
# landmine — any restart would have triggered it). 512Mi covers the spike;
|
||||
# data plane (calico-node) is unaffected by an operator restart.
|
||||
resources = { limits = { memory = "512Mi" } }
|
||||
})]
|
||||
}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue