infra

Viktor Barzin 51313ee088 kured: fix sentinel-gate OOM — 256Mi limit + self-restart leak guard The k8s-master gate pod OOM-killed child kubectls 149x/7d (accelerating: 0/day → 15 → 134) while master sat in pending-reboot. Root cause: only the pending-reboot node's gate pod runs the kubectl-heavy hot path each cycle, and the immortal bash loop slowly leaks (kubectl forks + Check-4 process substitution) past the 64Mi cgroup limit. PID 1 bash survives each kill, so the pod never restarts — just silent oom_events. Fix: raise limit 64Mi→256Mi (headroom for ~30-50Mi kubectl forks) + add a MAX_ITER=72 self-exit (~6h) so kubelet restarts the pod fresh and the leak can never accumulate, regardless of how long a node stays pending-reboot. Docs: post-mortem + automated-upgrades.md gate note. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>		2026-05-31 14:49:04 +00:00
..
main.tf	kured: fix sentinel-gate OOM — 256Mi limit + self-restart leak guard	2026-05-31 14:49:04 +00:00
secrets	[infra] Adopt kured + sentinel-gate into Terraform (Wave 5a)	2026-04-18 22:33:29 +00:00
terragrunt.hcl	[infra] Adopt kured + sentinel-gate into Terraform (Wave 5a)	2026-04-18 22:33:29 +00:00