infra/stacks
Viktor Barzin 9c68d147e0
Some checks failed
ci/woodpecker/push/postmortem-todos Pipeline was successful
ci/woodpecker/push/default Pipeline failed
k8s-upgrade: reclaim+auto-prune kubeadm /etc/kubernetes/tmp leak; correct crash root cause to etcd IO (not OIDC)
Digging into "why did the apiserver crash" disproved the earlier OIDC
explanation. An isolated v1.35.6 apiserver repro with authentik reachable
initialises OIDC cleanly (oidc.go:313, no error) and runs fine — so the
--authentication-config -> --oidc-* revert is NOT what crashed it. etcd's
surviving crash-window log is the real cause: 1180 "apply request took too long"
warnings in 16 min, individual applies up to 4.3s (healthy <100ms) right as
kubeadm tried to bring up the new apiserver. That's etcd IO starvation on the
shared sdc HDD (beads code-oflt).

A big contributor + the reason master root fs sat at 73%: kubeadm dumps a full
~400MB etcd DB backup into /etc/kubernetes/tmp/kubeadm-backup-etcd-<ts>/ before
every etcd upgrade and never cleans it up — 145 dirs / 28GB had accumulated,
driving image-GC churn and extra write-IO onto etcd's spindle. Reclaimed live
(73% -> 23%) and added a preflight prune (>3 days) so it can't re-accumulate.

Also corrected the OIDC handling: the kubeadm-config drift is real but only
breaks dashboard/kubectl SSO AFTER a successful upgrade (recoverable via the
chain's restore.sh + the kubeadm-config reconciliation) — it does not crash the
apiserver. So the preflight check is now an ALERT, not a block (was added on the
wrong hypothesis). Post-mortem, runbook, and apiserver-oidc.tf header corrected.

Per Viktor: reclaim the disk and automate so the manual cleanup never recurs;
the durable IO fix remains code-oflt (etcd off the shared HDD).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 15:23:15 +00:00
..
_template fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
actualbudget eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
affine eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
android-emulator android-emulator: fix idle-sleeper dying with SIGPIPE before it could sleep 2026-06-24 08:57:36 +00:00
anisette fix(anisette): wait_for_rollout=false so a slow first start can't strand the deploy out of state 2026-06-14 20:56:30 +00:00
authentik eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
beads-server eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
blog fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
broker-sync eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
calico calico: enable Goldmane + Whisker (Calico 3.30 OSS flow observability) 2026-06-24 12:22:48 +00:00
changedetection eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
chrome-service chrome-service: run real Google Chrome (H.264/AAC codecs) for the browser 2026-06-22 21:15:36 +00:00
ci-pipeline-health eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
city-guesser fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
claude-agent-service eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
claude-breakglass eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
claude-memory eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
cloudflared cloudflared: disable in-place autoupdate (--no-autoupdate) 2026-06-10 21:00:05 +00:00
cnpg fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
coturn eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
crowdsec traefik/crowdsec: remove dead Yaegi-plugin middleware reference (PR1/2) 2026-06-21 00:15:12 +00:00
cyberchef fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
dashy fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
dawarich eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
dbaas dbaas: document postgresql-backup startingDeadlineSeconds rationale 2026-06-13 14:22:24 +00:00
descheduler etcd-load-reduction: remove VPA/Goldilocks, disable kyverno reporting, descheduler hourly 2026-06-12 19:41:22 +00:00
diun eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
ebook2audiobook fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
ebooks eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
echo fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
excalidraw fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
external-secrets eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
f1-stream eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
fire-planner eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
forgejo eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
freedify eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
freshrss eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
frigate fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
goldmane-edge-aggregator deploy goldmane-edge-aggregator: durable who-talks-to-whom edge trail (#58, ADR-0014) 2026-06-24 20:59:39 +00:00
grampsweb eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
hackmd eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
headscale fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
health eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
hermes-agent eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
homepage fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
immich eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
infra fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
infra-maintenance fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
insta2spotify eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
instagram-poster instagram-poster: force_conflicts on ESO manifests (fix apply) 2026-06-24 20:49:53 +00:00
isponsorblocktv fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
job-hunter eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
jsoncrack fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
k8s-dashboard eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
k8s-portal k8s-portal: wire private-ghcr pull (allowlist + imagePullSecrets) 2026-06-13 15:38:42 +00:00
k8s-version-upgrade k8s-upgrade: reclaim+auto-prune kubeadm /etc/kubernetes/tmp leak; correct crash root cause to etcd IO (not OIDC) 2026-06-25 15:23:15 +00:00
keel fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
kms eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
kured fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
kyverno deploy goldmane-edge-aggregator: durable who-talks-to-whom edge trail (#58, ADR-0014) 2026-06-24 20:59:39 +00:00
linkwarden eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
llama-cpp fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
local-path fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
mailserver eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
matrix eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
meshcentral fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
metallb fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
metrics-server fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
monitoring fix(monitoring): force_conflicts on grafana_db_creds ExternalSecret 2026-06-24 12:25:36 +00:00
n8n eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
navidrome eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
netbox eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
networking-toolbox fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
nextcloud eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
nextcloud-todos eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
nfs-csi fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
nodelocal-dns fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
novelapp eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
ntfy fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
nvidia fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
onlyoffice eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
openclaw eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
osm_routing fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
owntracks eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
paperless-ai eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
paperless-mcp eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
paperless-ngx eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
payslip-ingest eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
phpipam eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
platform fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
plotting-book eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
poison-fountain traefik/crowdsec: remove dead Yaegi-plugin middleware reference (PR1/2) 2026-06-21 00:15:12 +00:00
portal-assistant portal-assistant: land voice stacks + switch TTS to edge-tts (intelligible Bulgarian) 2026-06-17 20:25:29 +00:00
portal-realtime portal-realtime: deploy the v2 full-duplex voice agent (Pipecat) 2026-06-20 08:23:17 +00:00
portal-stt portal-stt: drop setup_tls_secret module (ClusterIP-only, no fullchain.pem) 2026-06-17 20:29:31 +00:00
portal-tts portal-tts: docker.io/ prefix on edge-tts image (Kyverno trusted-registries) 2026-06-17 21:24:34 +00:00
postiz eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
priority-pass priority-pass: bump image_tag to 63e118c3 [ci skip] 2026-06-16 17:45:33 +00:00
privatebin fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
proxmox-csi eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
pvc-autoresizer fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
rbac k8s-upgrade: reclaim+auto-prune kubeadm /etc/kubernetes/tmp leak; correct crash root cause to etcd IO (not OIDC) 2026-06-25 15:23:15 +00:00
real-estate-crawler eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
recruiter-responder eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
redis fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
reloader fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
resume eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
reverse-proxy traefik/crowdsec: remove 6 hard-coded middleware refs the variable sweep missed (PR1/2) 2026-06-21 00:17:40 +00:00
rybbit eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
sealed-secrets fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
send fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
servarr eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
shadowsocks eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
speedtest eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
status-page fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
stem95su eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
stirling-pdf fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
t3-afk eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
t3code t3-probe: fix aiohttp 3.9 compat (ClientWSTimeout is 3.10+) 2026-06-10 21:26:09 +00:00
tandoor eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
technitium eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
terminal docs: sync CI/CD docs to ADR-0002 final state (ghcr + Woodpecker deploy-only) [ci skip] 2026-06-13 12:55:49 +00:00
tor-proxy fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
trading-bot eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
traefik traefik/crowdsec: delete dead Yaegi plugin + middleware CRD + captcha (PR2/2) 2026-06-21 13:35:13 +00:00
trek fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
tripit eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
tts tts: TCP probes — http liveness killed the server mid-synthesis 2026-06-12 20:57:28 +00:00
tuya-bridge eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
uptime-kuma uptime-kuma: add CONTEXT.md + ADR-0001 (intentionally lean; sizing/placement review) 2026-06-14 09:11:22 +00:00
url eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
vault deploy goldmane-edge-aggregator: durable who-talks-to-whom edge trail (#58, ADR-0014) 2026-06-24 20:59:39 +00:00
vaultwarden fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
vpa etcd-load-reduction: remove VPA/Goldilocks, disable kyverno reporting, descheduler hourly 2026-06-12 19:41:22 +00:00
wealthfolio eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
webhook_handler eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00
whisper fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
wireguard fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
woodpecker eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 2026-06-22 19:13:04 +00:00
xray fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
ytdlp eso: complete migration — chart 2.6.0, all CRs on v1, 1.35 gate cleared 2026-06-23 09:55:51 +00:00