infra/stacks
Viktor Barzin 51313ee088 kured: fix sentinel-gate OOM — 256Mi limit + self-restart leak guard
The k8s-master gate pod OOM-killed child kubectls 149x/7d (accelerating:
0/day → 15 → 134) while master sat in pending-reboot. Root cause: only the
pending-reboot node's gate pod runs the kubectl-heavy hot path each cycle,
and the immortal bash loop slowly leaks (kubectl forks + Check-4 process
substitution) past the 64Mi cgroup limit. PID 1 bash survives each kill, so
the pod never restarts — just silent oom_events.

Fix: raise limit 64Mi→256Mi (headroom for ~30-50Mi kubectl forks) + add a
MAX_ITER=72 self-exit (~6h) so kubelet restarts the pod fresh and the leak
can never accumulate, regardless of how long a node stays pending-reboot.

Docs: post-mortem + automated-upgrades.md gate note.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-31 14:49:04 +00:00
..
_template
actualbudget
affine
authentik
beads-server
blog
broker-sync
calico
changedetection
chrome-service
city-guesser
claude-agent-service
claude-memory
cloudflared
cnpg
coturn
crowdsec
cyberchef
dashy
dawarich
dbaas
descheduler
diun
ebook2audiobook
ebooks
echo
excalidraw
external-secrets
f1-stream
fire-planner
forgejo
freedify
freshrss
frigate
grampsweb
hackmd
headscale
health
hermes-agent
homepage
immich
infra
infra-maintenance
insta2spotify
instagram-poster
isponsorblocktv
job-hunter
jsoncrack
k8s-dashboard
k8s-portal
k8s-version-upgrade
keel
kms
kured kured: fix sentinel-gate OOM — 256Mi limit + self-restart leak guard 2026-05-31 14:49:04 +00:00
kyverno
linkwarden
llama-cpp
local-path
mailserver
matrix
meshcentral
metallb
metrics-server
monitoring
n8n
navidrome
netbox
networking-toolbox
nextcloud
nfs-csi
nodelocal-dns
novelapp
ntfy
nvidia
onlyoffice
openclaw
osm_routing
owntracks
paperless-mcp
paperless-ngx
payslip-ingest
phpipam
platform
plotting-book
poison-fountain
postiz
priority-pass
privatebin
proxmox-csi
pvc-autoresizer
rbac
real-estate-crawler
recruiter-responder
redis
reloader
resume
reverse-proxy
rybbit
sealed-secrets
send
servarr
shadowsocks
speedtest
status-page
stirling-pdf
tandoor
technitium
terminal
tor-proxy
trading-bot
traefik
travel-agent travel-agent: switch from Slack webhook to bot token (chat.postMessage) 2026-05-30 22:44:11 +00:00
travel_blog
tripit
tuya-bridge
uptime-kuma
url
vault
vaultwarden
vpa
wealthfolio
webhook_handler
whisper
wireguard
woodpecker
xray
ytdlp