From 9fd54143c27be947a2448a72f4c200cf4b218e0c Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Tue, 19 May 2026 13:10:00 +0000 Subject: [PATCH] =?UTF-8?q?docs:=20design=20+=20plan=20for=20MySQL=208.4.8?= =?UTF-8?q?=20=E2=86=92=208.4.9=20upgrade?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Captures the wipe+reinit strategy (sidestep the broken DD upgrade path), the IO config bump (innodb_io_capacity 100→2000), root-cause analysis with explicit uncertainty, verification gates, and rollback. Not scheduled yet. Tracked in beads code-963q. Co-Authored-By: Claude Opus 4.7 --- .../2026-05-19-mysql-8.4.9-upgrade-design.md | 112 ++++++ .../2026-05-19-mysql-8.4.9-upgrade-plan.md | 349 ++++++++++++++++++ 2 files changed, 461 insertions(+) create mode 100644 docs/plans/2026-05-19-mysql-8.4.9-upgrade-design.md create mode 100644 docs/plans/2026-05-19-mysql-8.4.9-upgrade-plan.md diff --git a/docs/plans/2026-05-19-mysql-8.4.9-upgrade-design.md b/docs/plans/2026-05-19-mysql-8.4.9-upgrade-design.md new file mode 100644 index 00000000..7b5abf9a --- /dev/null +++ b/docs/plans/2026-05-19-mysql-8.4.9-upgrade-design.md @@ -0,0 +1,112 @@ +# MySQL 8.4.8 → 8.4.9 Upgrade — Design + +**Date**: 2026-05-19 +**Status**: Drafted, **NOT scheduled**. Execute only inside a planned maintenance window with user sign-off. +**Beads**: (filed alongside this doc) +**Related**: `docs/runbooks/restore-mysql.md`, beads `code-eme8` / `code-k40p` (closed in `ea475c3d`) + +## Background + +On 2026-05-18, Keel auto-bumped the `mysql:8.4` floating tag on the +`mysql-standalone` StatefulSet from 8.4.8 to 8.4.9. The in-server data +dictionary upgrade (80408 → 80409) stalled reliably: ~24 s of writes to +`mysql.ibd` + redo log after "Server upgrade started", then complete +silence — no CPU, no flushes, no errors, no completion. The `boot` +thread sat in user-space sleep (`State: S`, `wchan: 0`) for 10+ +minutes; the MySQLX socket appeared but `mysqld.sock` never did. Even +with `liveness_probe.initial_delay_seconds = 600`, the upgrade never +completed. + +Recovery (commit `ea475c3d`): pinned image to `mysql:8.4.8` exactly, +wiped the corrupted PVC, restored from the 00:30 UTC mysqldump. Total +downtime: ~25 min. Forgejo + 7 dependent apps offline during that +window. + +## Root cause — best evidence + +We never proved this definitively because we couldn't connect to MySQL +during the stall, but the strongest hypothesis is **flush starvation +during the DD upgrade's mandatory checkpoint**: + +1. Upgrade rewrites `mysql.st_spatial_reference_systems` (5103 SRS + defs) + dirties pages across the system tablespace. +2. Reaches a point where it must checkpoint before continuing. +3. The page-cleaner thread can't drain dirty pages fast enough because + `innodb_io_capacity=100` (1.6 MB/s effective flush rate, default is + 200, recommended for SSDs is 2000+) combined with + `innodb_page_cleaners=1`. +4. The `boot` thread waits on a pthread condvar that the flush + coordinator should signal but never does within probe timeout. + +Why we're not 100 % certain: +- LUKS2-encrypted block storage (`proxmox-lvm-encrypted`) may + contribute its own flush latency. +- We didn't capture a stack trace from the stalled `boot` thread + (`/proc/1/task/118/stack` was `permission denied`). +- A genuine MySQL 8.4.9 bug in the SRS-update path is possible (worth + checking the MySQL bug tracker before retry). + +**Organizational root cause** (definitive): the `mysql:8.4` floating +tag let Keel auto-bump without testing. Already fixed — image pinned +to `mysql:8.4.8` exactly. + +## Decisions + +| # | Decision | Notes | +|---|----------|-------| +| 1 | **Approach: wipe + re-init on 8.4.9** (logical migration via fresh init + dump-restore) | The DD upgrade is the broken path. A fresh 8.4.9 init starts at version 80409 directly — no upgrade ever runs. We've executed wipe+restore once in ~25 min; the path is now well-trodden. | +| 2 | **Pre-flight: bump InnoDB IO config** | `innodb_io_capacity=2000`, `innodb_io_capacity_max=4000`, `innodb_page_cleaners=4`. These are the long-term-correct values regardless of the upgrade — current settings are ~10× too conservative for the workload. | +| 3 | **Restore strategy: per-database dumps, NOT the full `--all-databases` dump** | Per-db dumps at `/srv/nfs/mysql-backup/per-db//` skip the `mysql` system schema entirely. Avoids the question of "will 8.4.8 mysql-schema rows confuse 8.4.9". User accounts get recreated via Vault + null_resource. | +| 4 | **Fresh dump immediately before cutover, not yesterday's** | The daily dump runs at 00:30 UTC. The cutover dump must come from < 60 s before scale-to-0 to minimize data loss. Kick `mysql-backup-per-db` CronJob manually. | +| 5 | **Maintenance window required** | All MySQL-dependent apps offline ~25 min: Forgejo (+ registry → ImagePullBackOff cascade), Nextcloud, HackMD, Grafana, Paperless, Uptime-Kuma, Shlink, realestate-crawler, phpipam, technitium, vikunja, freshrss, finance, resume. Pick a low-traffic window (suggest Sunday 03:00 UK). | +| 6 | **Single rollback path: re-pin to 8.4.8 + same wipe/restore flow** | If 8.4.9 fresh init misbehaves post-restore, rollback IS the same procedure, just with image=8.4.8. The pinned 8.4.8 dump survives. No new failure modes. | +| 7 | **Out of scope for this upgrade**: tuning that doesn't gate the upgrade | Right-sizing buffer pool, switching to async commits, changing storage class, replication — all separate decisions. | + +## Verification gates + +Before declaring done: +1. `kubectl -n dbaas exec mysql-standalone-0 -- mysql -uroot -p"$PW" -e "SELECT VERSION();"` returns `8.4.9`. +2. `SHOW DATABASES;` lists all 20 user databases. +3. Table count per schema matches the pre-upgrade snapshot (recorded + in step 1 of the plan). +4. `forgejo` logs show successful DB ping; `kubectl -n forgejo get pod` is 1/1 Running. +5. `kubectl get deploy,sts -A` shows no unready workloads. +6. `bash infra/scripts/cluster_healthcheck.sh --quiet` returns same or + better PASS/WARN/FAIL ratio as pre-upgrade. +7. Forgejo integrity probe reports 0 failures (manual trigger). +8. `RegistryCatalogInaccessible` not firing in Prometheus. + +## Risks + mitigations + +| Risk | Likelihood | Mitigation | +|---|---|---| +| 8.4.9 fresh init has *some other* unobserved bug | Low | Smoke-test on a parallel PVC in dbaas before touching the real one (optional but cheap — adds 30 min). See plan Phase 1. | +| Per-db dump-restore misses a database the user added recently | Low | Compare `SHOW DATABASES` against the per-db dump directory listing pre-cutover. If a DB exists in MySQL but not in `/srv/nfs/mysql-backup/per-db/`, dump it manually first. | +| Forgejo/roundcubemail static-user passwords drift again after restore | Certain | Already documented in runbook — DROP USER + CREATE USER from Vault values immediately after restore. | +| The cutover dump itself is corrupt | Very low | mysqldump exits non-zero on failure. CronJob already pushes `backup_last_success_timestamp` to Pushgateway. Verify timestamp is fresh before proceeding. | +| Apps fail to reconnect after MySQL restart | Low | Already-proven recipe: `kubectl rollout restart` on the affected deployments. Listed exhaustively in runbook §B.8. | +| 8.4.9 fresh init *also* stalls (root cause was NOT flush starvation) | Medium-low | Pre-flight test on parallel PVC catches this before maintenance window. If real prod init stalls, immediately revert TF pin to 8.4.8, redo same dump-restore flow. Same 25 min downtime as the original recovery. | + +## Why not alternatives + +- **In-place DD upgrade with bumped IO config**: simpler, but if it + still stalls we lose 30–60 min waiting + still fall back to + wipe+restore. Same data risk; worse expected time. We *would* learn + whether the bumped IO settings fix the upgrade, but the fresh init + approach makes that knowledge unnecessary. +- **Parallel migration (new mysql-standalone-new pod alongside)**: + cleanest rollback (instant via service-selector flip), but needs TF + surgery to declare two StatefulSets temporarily and isn't worth the + complexity when the wipe+restore approach is now proven. +- **Wait for 8.4.10 / 8.5 LTS**: leaves us stuck on 8.4.8 indefinitely. + Acceptable for now (we're pinned), but not a permanent answer. + +## Out of scope + +- A standby/replica MySQL for zero-downtime upgrades (separate + initiative — see future planning around CNPG-style HA for MySQL). +- Removing `proxmox-lvm-encrypted` LUKS2 from the equation (the + encryption is a security requirement; debugging its flush latency is + separate). +- Replacing MySQL with PostgreSQL (long-term goal for some apps; not + this upgrade). diff --git a/docs/plans/2026-05-19-mysql-8.4.9-upgrade-plan.md b/docs/plans/2026-05-19-mysql-8.4.9-upgrade-plan.md new file mode 100644 index 00000000..2c58b814 --- /dev/null +++ b/docs/plans/2026-05-19-mysql-8.4.9-upgrade-plan.md @@ -0,0 +1,349 @@ +# MySQL 8.4.8 → 8.4.9 Upgrade — Plan + +**Date**: 2026-05-19 +**Status**: Drafted, **NOT scheduled** +**Design**: `2026-05-19-mysql-8.4.9-upgrade-design.md` +**Estimated downtime**: 25–30 min (all MySQL-dependent apps offline) +**Window**: Suggest Sunday 03:00 UK (low traffic, kured window doesn't fight us) + +## Pre-flight (before the maintenance window) + +### P.1 Optional smoke test on a parallel PVC (recommended, +30 min) + +In a non-production session, before scheduling the real cutover: + +```bash +# 1. Create a temporary StatefulSet `mysql-smoketest` in dbaas with the +# same image (mysql:8.4.9), same configmap, brand-new PVC. +# Use a one-off kubectl apply -f /tmp/smoketest.yaml — NOT Terraform — +# so it doesn't pollute the real stack. +# 2. Verify it inits to 8.4.9 cleanly (mysqld.sock appears, "ready for connections"). +# 3. Restore one of the smaller per-db dumps (e.g. resume, freshrss) into it. +# 4. Delete the smoketest StatefulSet + PVC. +``` + +Outcome: +- ✅ Init succeeds → proceed with the real upgrade with high confidence. +- ❌ Init stalls → root cause was not flush starvation. Halt and re-investigate. The real upgrade is unsafe. + +### P.2 Read the MySQL 8.4.9 release notes + bug tracker + +Specifically look for issues filed since 8.4.9 GA against the DD upgrade +path or `st_spatial_reference_systems`. If a known fix landed in 8.4.10 +or 8.5.x, consider waiting. + +### P.3 Confirm backup pipeline is healthy + +```bash +# Latest per-db dumps exist for all 20 databases +kubectl -n dbaas exec mysql-standalone-0 -- bash -c \ + 'for d in $(ls /backup/per-db/); do echo -n "$d: "; ls -t /backup/per-db/$d/ | head -1; done' + +# Pushgateway shows recent success +kubectl -n monitoring exec deploy/prometheus-server -c prometheus-server -- \ + wget -qO- 'http://prometheus-prometheus-pushgateway:9091/metrics' | grep mysql-backup-per-db +``` + +### P.4 Pin maintenance window and notify + +Brief the user. Confirm window. Disable any background scrapers / +schedulers / bots that would create noise during the cutover. + +## Execution (inside the maintenance window) + +### Step 1 — Pre-flight snapshot + +```bash +ROOT_PWD=$(kubectl -n dbaas get secret cluster-secret -o jsonpath='{.data.ROOT_PASSWORD}' | base64 -d) + +# Record current state for verification later +kubectl -n dbaas exec mysql-standalone-0 -- mysql -uroot -p"$ROOT_PWD" \ + -e "SELECT table_schema, COUNT(*) AS tables FROM information_schema.tables \ + WHERE table_schema NOT IN ('information_schema','performance_schema','sys','mysql') \ + GROUP BY table_schema;" > /tmp/mysql-pre-upgrade-table-counts.txt +cat /tmp/mysql-pre-upgrade-table-counts.txt +``` + +### Step 2 — Trigger a fresh per-db dump + +```bash +kubectl -n dbaas create job --from=cronjob/mysql-backup-per-db pre-upgrade-$(date +%s) +# Wait for completion (typically <2 min) +kubectl -n dbaas wait --for=condition=complete --timeout=300s job/pre-upgrade- +``` + +Verify all 20 databases dumped: + +```bash +kubectl -n dbaas exec mysql-standalone-0 -- bash -c \ + 'for d in $(ls /backup/per-db/); do + newest=$(ls -t /backup/per-db/$d/ | head -1) + echo "$d: $newest" + done' +``` + +Every entry should have a `dump__*.sql.gz` listed. + +### Step 3 — Bump InnoDB IO config + image pin in Terraform + +In `stacks/dbaas/modules/dbaas/main.tf`: + +```diff +- innodb_io_capacity=100 +- innodb_io_capacity_max=200 +- innodb_page_cleaners=1 ++ innodb_io_capacity=2000 ++ innodb_io_capacity_max=4000 ++ innodb_page_cleaners=4 +``` + +```diff +- # Pinned to 8.4.8 — 8.4.9 DD upgrade got stuck (no progress, no CPU) +- # repeatedly across multiple attempts. ... +- image = "mysql:8.4.8" ++ # Re-pinned to 8.4.9 on 2026-MM-DD after the wipe+reinit upgrade ++ # path (see docs/plans/2026-05-19-mysql-8.4.9-upgrade-*). ++ image = "mysql:8.4.9" +``` + +Commit but **do not apply yet**. + +### Step 4 — Stop MySQL + +```bash +kubectl -n dbaas scale statefulset mysql-standalone --replicas=0 +# Wait for pod deletion +kubectl -n dbaas wait --for=delete pod/mysql-standalone-0 --timeout=120s +``` + +### Step 5 — Wipe the PVC + +```bash +PV=$(kubectl -n dbaas get pvc data-mysql-standalone-0 -o jsonpath='{.spec.volumeName}') +kubectl patch pv "$PV" -p '{"spec":{"persistentVolumeReclaimPolicy":"Delete"}}' +kubectl -n dbaas delete pvc data-mysql-standalone-0 +# Confirm PV vanishes (CSI cleans up the LV) +kubectl get pv | grep -q "$PV" && echo "WARNING: PV still present" || echo "PV cleaned up" +``` + +### Step 6 — Apply Terraform (8.4.9 + bumped IO) + +```bash +cd stacks/dbaas +/home/wizard/code/infra/scripts/tg apply +``` + +This creates a fresh 5 Gi PVC + new pod on `mysql:8.4.9`. Initial-init +takes ~30 s. Verify: + +```bash +kubectl -n dbaas wait --for=condition=ready pod/mysql-standalone-0 --timeout=300s +kubectl -n dbaas exec mysql-standalone-0 -- mysql -uroot -p"$ROOT_PWD" -e "SELECT VERSION();" +# expect: 8.4.9 +``` + +**If the pod fails to become Ready within 5 min**: this is the +"root cause was not flush starvation" failure mode. Abort the upgrade, +revert the image pin to 8.4.8 in TF, re-run from Step 4 (wipe + apply +8.4.8 + restore). Total extra downtime ~25 min. + +### Step 7 — Restore per-db dumps (NOT the full --all-databases dump) + +```bash +ROOT_PWD=$(kubectl -n dbaas get secret cluster-secret -o jsonpath='{.data.ROOT_PASSWORD}' | base64 -d) + +cat <`. +Expected time: ~3 min for all 20 databases. + +### Step 8 — Recreate Vault-rotated + static users + +The per-db restore did NOT touch `mysql.user`. Recreate all app users +fresh: + +```bash +# Static users (forgejo, roundcubemail) from Vault +FORGEJO_PW=$(vault kv get -field=mysql_forgejo_password secret/viktor) +RC_PW=$(vault kv get -field=mysql_roundcubemail_password secret/viktor) + +kubectl -n dbaas exec -i mysql-standalone-0 -- bash -c 'mysql -uroot -p"$MYSQL_ROOT_PASSWORD"' < 0) | .metadata.name' | \ + wc -l)" -eq 0 ]; do + sleep 5 +done +echo "All workloads ready" +``` + +### Step 10 — Force ImagePullBackOff pods to retry (Forgejo registry was offline) + +```bash +for ns in chrome-service fire-planner freedify; do + kubectl -n "$ns" delete pod --all 2>/dev/null || true +done +``` + +### Step 11 — Clean up failed CronJob pods from the outage window + +```bash +kubectl delete pods -A --field-selector=status.phase=Failed +``` + +### Step 12 — Verify (matches design §Verification gates) + +```bash +# 1. Version +kubectl -n dbaas exec mysql-standalone-0 -- mysql -uroot -p"$ROOT_PWD" -e "SELECT VERSION();" +# expect: 8.4.9 + +# 2-3. Databases + table counts +kubectl -n dbaas exec mysql-standalone-0 -- mysql -uroot -p"$ROOT_PWD" \ + -e "SELECT table_schema, COUNT(*) FROM information_schema.tables \ + WHERE table_schema NOT IN ('information_schema','performance_schema','sys','mysql') \ + GROUP BY table_schema;" > /tmp/mysql-post-upgrade-table-counts.txt +diff /tmp/mysql-pre-upgrade-table-counts.txt /tmp/mysql-post-upgrade-table-counts.txt +# expect: no diff (or only counts that grew between snapshots) + +# 4. Forgejo +kubectl -n forgejo get pod +kubectl -n forgejo logs deploy/forgejo --tail=20 | grep -iE "ORM engine|ready" +# expect: 1/1 Running, "ORM engine initialized" + +# 5. Cluster health +bash /home/wizard/code/infra/scripts/cluster_healthcheck.sh --quiet + +# 6. Registry integrity probe +kubectl -n monitoring create job --from=cronjob/forgejo-integrity-probe \ + postupgrade-$(date +%s) +kubectl -n monitoring logs job/postupgrade- --tail=5 +# expect: "Probe complete: 0 failures" + +# 7. RegistryCatalogInaccessible not firing +kubectl -n monitoring exec deploy/prometheus-server -c prometheus-server -- \ + wget -qO- 'http://localhost:9090/api/v1/alerts' | \ + python3 -c "import json,sys; d=json.load(sys.stdin); [print(a['labels']['alertname']) for a in d['data']['alerts'] if a['state']=='firing']" +# expect: empty / no RegistryCatalogInaccessible +``` + +### Step 13 — Commit + push the Terraform change + +```bash +git add stacks/dbaas/modules/dbaas/main.tf +git commit -m "dbaas: pin MySQL to 8.4.9 after successful wipe+reinit upgrade + +Executed per docs/plans/2026-05-19-mysql-8.4.9-upgrade-{design,plan}.md. +The full upgrade ran clean — fresh init on 8.4.9 sidestepped the DD +upgrade stall. IO config bumped to 2000/4 (was 100/1) for the workload. +" +git push +``` + +## Rollback path (if Step 6 or Step 7 fails catastrophically) + +The wipe at Step 5 is destructive — once executed, the original disk +is gone. Rollback is **same procedure, image=8.4.8**: + +1. Edit TF: `image = "mysql:8.4.8"` +2. `kubectl -n dbaas scale sts mysql-standalone --replicas=0` +3. Re-wipe (already wiped; just `tg apply`) +4. Run the Step 7 restore Job again (now on 8.4.8) +5. Run Step 8-11 +6. Update Terraform comment to reflect retained 8.4.8 pin. + +Extra downtime: ~25 min on top of the existing window. + +## Post-upgrade follow-ups + +- Update `infra/.claude/CLAUDE.md` MySQL row to reflect 8.4.9 pin. +- Update `docs/runbooks/restore-mysql.md` to reflect 8.4.9. +- Re-evaluate whether the new IO config (2000/4) is overkill for the + workload after 1-2 weeks — could drop to 1000/2 if needed. +- Optional: file a follow-up task to investigate MySQL HA/replication + so the next upgrade isn't blocking.