calico: bump tigera-operator mem limit 256Mi -> 512Mi (OOM crashloop fix)

The operator OOM-crashlooped on 2026-06-23: it idles at ~246Mi with a ~266Mi startup spike (re-listing resources to build informer caches), both at/over the 256Mi limit, so the first time the pod restarted it could never finish startup (exit 137 OOMKilled, leader-elect, OOM, repeat). A latent landmine — the limit was always too tight; it only bit once the pod restarted. Data plane was never affected (calico-node 7/7, tigerastatus green throughout). 512Mi gives headroom (now ~246Mi steady, verified stable 0 restarts). NOT caused by the ESO migration (which never touched calico); cluster churn was at most the trigger that exposed the tight limit. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 12:46:28 +00:00 · 2026-06-23 12:46:28 +00:00 · 98d2b89614
commit 98d2b89614
parent 68c240b8de
1 changed files with 6 additions and 1 deletions
--- a/stacks/calico/main.tf
+++ b/stacks/calico/main.tf
@ -162,6 +162,11 @@ resource "helm_release" "tigera_operator" {
    # are installed -> "ensure CRDs are installed first". Not needed here.
    goldmane  = { enabled = false }
    whisker   = { enabled = false }
-    resources = { limits = { memory = "256Mi" } }
+    # 512Mi (was 256Mi): the operator idles at ~38Mi but its STARTUP spike
+    # (re-listing resources to build informer caches) exceeded 256Mi and
+    # OOM-crashlooped on 2026-06-23 the first time the pod restarted (a latent
+    # landmine — any restart would have triggered it). 512Mi covers the spike;
+    # data plane (calico-node) is unaffected by an operator restart.
+    resources = { limits = { memory = "512Mi" } }
  })]
 }