infra/docs/runbooks
Viktor Barzin 411524a10d kured: drop Mon-Fri restriction, reboot any day
The weekday-only schedule was a 2026-03-16-incident-era guardrail when
the rest of the safety net was thin. Today's gates — halt-on-alert,
sentinel-gate Check 4 (24h soak via node Ready transitions), the
K8sUpgradeStalled alert, drainTimeout=30m, concurrency=1, and the
sentinel-path fix from earlier today — make weekend reboots safe and
just clear the backlog faster.

Effect: 5 pending node reboots clear in 5 calendar days instead of
queueing up over weekends. The K8s version-upgrade detection at Sun
12:00 UTC self-defers if a Sunday-morning kured reboot fires (the
RecentNodeReboot alert is in the Upgrade Gates ignore-less list for
the version-upgrade preflight — same mechanism kured uses).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 14:16:48 +00:00
..
beads-auto-dispatch.md
forgejo-registry-breakglass.md [ci] Phase 1: infra-ci dual-push + break-glass tarball 2026-05-07 23:29:33 +00:00
forgejo-registry-rebuild-image.md [docs] Forgejo registry image-rebuild runbook 2026-05-07 23:29:33 +00:00
forgejo-registry-setup.md [forgejo] Phase 0 of registry consolidation: prepare Forgejo OCI registry 2026-05-07 23:29:33 +00:00
k8s-node-auto-upgrades.md kured: drop Mon-Fri restriction, reboot any day 2026-05-22 14:16:48 +00:00
k8s-version-upgrade.md kured: drop Mon-Fri restriction, reboot any day 2026-05-22 14:16:48 +00:00
kms-public-exposure.md kms: document native DNS auto-discovery (no client config needed) 2026-05-22 14:16:40 +00:00
mailserver-pfsense-haproxy.md mailserver: split healthcheck path off PROXY-aware listeners + book-search uses ClusterIP 2026-05-05 19:45:33 +00:00
mailserver-proxy-protocol.md
nfs-prerequisites.md
pfsense-unbound.md
proxmox-host.md
r730-ram-upgrade-272gb.md [docs] TrueNAS decommission cleanup — remove references from active docs 2026-04-19 16:55:43 +00:00
registry-rebuild-image.md [registry] Stop recurring orphan OCI-index incidents — detection + prevention + recovery 2026-04-19 17:08:28 +00:00
registry-vm.md [forgejo] Phases 3+4+5: cutover, decommission, docs sweep 2026-05-07 23:29:34 +00:00
restore-etcd.md [docs] TrueNAS decommission cleanup — remove references from active docs 2026-04-19 16:55:43 +00:00
restore-full-cluster.md [docs] TrueNAS decommission cleanup — remove references from active docs 2026-04-19 16:55:43 +00:00
restore-lvm-snapshot.md
restore-mysql.md [docs] TrueNAS decommission cleanup — remove references from active docs 2026-04-19 16:55:43 +00:00
restore-postgresql.md [docs] TrueNAS decommission cleanup — remove references from active docs 2026-04-19 16:55:43 +00:00
restore-pvc-from-backup.md
restore-vault.md [docs] TrueNAS decommission cleanup — remove references from active docs 2026-04-19 16:55:43 +00:00
restore-vaultwarden.md [docs] TrueNAS decommission cleanup — remove references from active docs 2026-04-19 16:55:43 +00:00
technitium-apply.md
vault-raft-leader-deadlock.md vault runbook + raft/HA stuck-leader alerts 2026-04-22 12:44:46 +00:00
woodpecker-onboard-forgejo-repo.md [woodpecker] Programmatic Forgejo repo registration 2026-05-10 11:12:36 +00:00