infra/stacks
Viktor Barzin b931d9fb20
All checks were successful
ci/woodpecker/push/default Pipeline was successful
k8s-version-upgrade: make tigera-operator restore crash-safe (EXIT trap)
phase_master quiesces tigera-operator (Calico's config reconciler) to 0 around
the master upgrade so it can't crashloop during the apiserver blip + I/O-storm
kubeadm's static-pod-hash watch (which would roll the upgrade back). The restore
was a plain line at the end of the phase, so any abort AFTER quiescing left the
operator at 0 — and the idempotent retry then skipped the already-on-target
master phase and never restored it. Observed 2026-06-17: a post-upgrade gate
aborted the master attempt; the operator sat scaled to 0 for ~1.5h (data plane
fine — calico-node keeps running — but no Calico reconciliation).

Fix:
  - Drain first (drain doesn't blip the apiserver), THEN quiesce right before
    `kubeadm upgrade apply`, and install an EXIT trap that restores the operator
    no matter how the phase exits (gate abort, set -e on ssh/kubeadm, success).
    Trap is set AFTER drain_node so its own EXIT trap can't clobber it; cleared
    after the explicit happy-path restore.
  - postflight also force-restores replicas=1 as a final guarantee (covers the
    skip-on-retry path that never quiesces or restores).

Long-term fix remains HA control plane (apiserver never goes down) — bead code-n0ow.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 18:25:54 +00:00
..
_template
actualbudget
affine
android-emulator
anisette fix(anisette): wait_for_rollout=false so a slow first start can't strand the deploy out of state 2026-06-14 20:56:30 +00:00
authentik fix(authentik): derive username from email in tripit-enrollment (user_write needs it) 2026-06-17 07:35:23 +00:00
beads-server
blog
broker-sync
calico
changedetection
chrome-service chrome-service + mam-farming: doc clarifications (+ re-trigger CI apply missed earlier) 2026-06-16 09:34:23 +00:00
ci-pipeline-health
city-guesser
claude-agent-service
claude-breakglass
claude-memory
cloudflared
cnpg
coturn
crowdsec
cyberchef
dashy
dawarich
dbaas
descheduler
diun
ebook2audiobook
ebooks
echo
excalidraw
external-secrets
f1-stream
fire-planner
forgejo forgejo: custom 8Gi ResourceQuota (was pegged at the 4Gi tier cap) 2026-06-13 17:16:47 +00:00
freedify
freshrss
frigate
grampsweb
hackmd
headscale
health health: fix middleware ref namespace prefix (restore site from 404) 2026-06-14 17:43:08 +00:00
hermes-agent
homepage
immich Merge remote-tracking branch 'origin/master' into wizard/reconcile-mirror 2026-06-16 22:32:43 +00:00
infra
infra-maintenance
insta2spotify
instagram-poster
isponsorblocktv
job-hunter
jsoncrack
k8s-dashboard
k8s-portal
k8s-version-upgrade k8s-version-upgrade: make tigera-operator restore crash-safe (EXIT trap) 2026-06-17 18:25:54 +00:00
keel
kms
kured
kyverno
linkwarden
llama-cpp
local-path
mailserver
matrix
meshcentral
metallb
metrics-server
monitoring k8s-version-upgrade: scope chain-fail alert to terminal reasons + sync docs 2026-06-17 13:10:18 +00:00
n8n
navidrome
netbox
networking-toolbox
nextcloud
nextcloud-todos
nfs-csi
nodelocal-dns
novelapp
ntfy
nvidia
onlyoffice
openclaw
osm_routing
owntracks
paperless-mcp
paperless-ngx
payslip-ingest
phpipam
platform
plotting-book
poison-fountain
postiz
priority-pass priority-pass: bump image_tag to 63e118c3 [ci skip] 2026-06-16 17:45:33 +00:00
privatebin
proxmox-csi
pvc-autoresizer
rbac
real-estate-crawler
recruiter-responder
redis
reloader
resume
reverse-proxy
rybbit
sealed-secrets
send
servarr mam-farming: make MAMFarmingStuck a grabber heartbeat, not a grab-count check 2026-06-16 08:18:33 +00:00
shadowsocks
speedtest
status-page
stem95su
stirling-pdf
t3-afk t3-afk: fix agent Bash — stop mounting into ~/.claude 2026-06-15 20:49:34 +00:00
t3code
tandoor
technitium
terminal
tor-proxy
trading-bot
traefik health: dedicated 100/1000 rate limit for the redesigned SPA 2026-06-14 13:03:51 +00:00
trek
tripit Merge remote-tracking branch 'forgejo/master' into wizard/tripit-ingest-model 2026-06-16 20:39:30 +00:00
tts
tuya-bridge
uptime-kuma uptime-kuma: add CONTEXT.md + ADR-0001 (intentionally lean; sizing/placement review) 2026-06-14 09:11:22 +00:00
url
vault
vaultwarden
vpa
wealthfolio
webhook_handler
whisper
wireguard
woodpecker
xray
ytdlp