infra/docs/runbooks
Viktor Barzin 077ac97df5
Some checks failed
ci/woodpecker/push/default Pipeline failed
k8s-version-upgrade: auto-restore apiserver OIDC after control-plane bumps
kubeadm upgrade apply regenerates the apiserver static-pod manifest and drops
the --authentication-config flag, silently breaking SSO (kubectl/kubelogin + the
k8s dashboard) until someone manually re-applied the rbac stack. That manual step
ran after every control-plane upgrade — the one thing keeping autonomous patch
upgrades from being truly hands-off (it bit us this cycle: an earlier master bump
left SSO broken until we noticed).

Automate it: the rbac stack now publishes its existing OIDC restore script (the
same one its null_resource runs) to a kube-system/apiserver-oidc-restore
ConfigMap, and the upgrade chain's phase_master re-runs it on master right after
the kubeadm upgrade — while tigera-operator is still quiesced so the flag-add
apiserver restart can't crashloop it. The script is idempotent and health-gates
/livez with auto-rollback; the step is non-fatal (a failure only lags SSO until
the next rbac apply, it won't abort the upgrade). phase_master already self-skips
when master is at target, so this only fires when master was actually upgraded.

The chain SA gets a name-scoped get on that one ConfigMap. Runbook updated: the
manual restore is now a documented fallback (command corrected — it needs
-replace, since the null_resource trigger hash never changes).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 06:04:30 +00:00
..
apiserver-audit-logging.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
beads-auto-dispatch.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
breakglass-ssh.md break-glass SSH: drop port-knock for exposed key-only :52222; version host config 2026-06-11 18:23:39 +00:00
breakglass-ui.md claude-breakglass: in-cluster warm break-glass UI for the devvm 2026-06-12 21:40:17 +00:00
chrome-service-snapshot.md workstation: per-user playwright browser MCP for all users, reproducible from git 2026-06-16 20:33:47 +00:00
fan-control.md fan-control docs: sync runbook/env/service/design to the HA-actuator + anti-flap model 2026-06-16 08:11:48 +00:00
forgejo-registry-breakglass.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
forgejo-registry-rebuild-image.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
forgejo-registry-setup.md forgejo pulls: route *.viktorbarzin.me to Technitium, drop /etc/hosts pins [ci skip] 2026-06-10 07:56:31 +00:00
grow-pve-nfs-lv.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
immich-transcode-bitrate.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
job-hunter.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
k8s-node-auto-upgrades.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
k8s-version-upgrade.md k8s-version-upgrade: auto-restore apiserver OIDC after control-plane bumps 2026-06-19 06:04:30 +00:00
kms-public-exposure.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
mailserver-pfsense-haproxy.md pfsense: SNI-routed internal 443 — mail.viktorbarzin.me serves webmail everywhere 2026-06-10 18:41:07 +00:00
mailserver-proxy-protocol.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
nextcloud-add-archive.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
nfs-prerequisites.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
offboard-user.md workstation: emo direct master push — allow-then-audit [ci skip] 2026-06-10 14:53:43 +00:00
pfsense-unbound.md dns: pfSense forward-zone for viktorbarzin.me, nodes fully stock [ci skip] 2026-06-10 08:32:34 +00:00
proxmox-host.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
r730-ram-upgrade-272gb.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
registry-rebuild-image.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
registry-vm.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
restore-etcd.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
restore-full-cluster.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
restore-lvm-snapshot.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
restore-mysql.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
restore-postgresql.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
restore-pvc-from-backup.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
restore-vault.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
restore-vaultwarden.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
scale-k8s-cluster.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
security-incident.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
synology-storage.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
t3-drop-attribution.md t3: connection logging across the path for drop attribution 2026-06-11 13:48:10 +00:00
t3-version-bump.md t3: docs for the gated nightly tracker (runbook, post-mortem, service-catalog) 2026-06-16 11:33:49 +00:00
technitium-apply.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
tripit-external-signup.md feat(authentik): TripIt external self-signup group + forward-auth fence (ADR-0020) 2026-06-15 21:48:04 +00:00
vault-raft-leader-deadlock.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
vault-token-renew-devvm.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
woodpecker-onboard-forgejo-repo.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00