6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
112 lines
7.3 KiB
Markdown
112 lines
7.3 KiB
Markdown
# MySQL 8.4.8 → 8.4.9 Upgrade — Design
|
||
|
||
**Date**: 2026-05-19
|
||
**Status**: Drafted, **NOT scheduled**. Execute only inside a planned maintenance window with user sign-off.
|
||
**Beads**: (filed alongside this doc)
|
||
**Related**: `docs/runbooks/restore-mysql.md`, beads `code-eme8` / `code-k40p` (closed in `ea475c3d`)
|
||
|
||
## Background
|
||
|
||
On 2026-05-18, Keel auto-bumped the `mysql:8.4` floating tag on the
|
||
`mysql-standalone` StatefulSet from 8.4.8 to 8.4.9. The in-server data
|
||
dictionary upgrade (80408 → 80409) stalled reliably: ~24 s of writes to
|
||
`mysql.ibd` + redo log after "Server upgrade started", then complete
|
||
silence — no CPU, no flushes, no errors, no completion. The `boot`
|
||
thread sat in user-space sleep (`State: S`, `wchan: 0`) for 10+
|
||
minutes; the MySQLX socket appeared but `mysqld.sock` never did. Even
|
||
with `liveness_probe.initial_delay_seconds = 600`, the upgrade never
|
||
completed.
|
||
|
||
Recovery (commit `ea475c3d`): pinned image to `mysql:8.4.8` exactly,
|
||
wiped the corrupted PVC, restored from the 00:30 UTC mysqldump. Total
|
||
downtime: ~25 min. Forgejo + 7 dependent apps offline during that
|
||
window.
|
||
|
||
## Root cause — best evidence
|
||
|
||
We never proved this definitively because we couldn't connect to MySQL
|
||
during the stall, but the strongest hypothesis is **flush starvation
|
||
during the DD upgrade's mandatory checkpoint**:
|
||
|
||
1. Upgrade rewrites `mysql.st_spatial_reference_systems` (5103 SRS
|
||
defs) + dirties pages across the system tablespace.
|
||
2. Reaches a point where it must checkpoint before continuing.
|
||
3. The page-cleaner thread can't drain dirty pages fast enough because
|
||
`innodb_io_capacity=100` (1.6 MB/s effective flush rate, default is
|
||
200, recommended for SSDs is 2000+) combined with
|
||
`innodb_page_cleaners=1`.
|
||
4. The `boot` thread waits on a pthread condvar that the flush
|
||
coordinator should signal but never does within probe timeout.
|
||
|
||
Why we're not 100 % certain:
|
||
- LUKS2-encrypted block storage (`proxmox-lvm-encrypted`) may
|
||
contribute its own flush latency.
|
||
- We didn't capture a stack trace from the stalled `boot` thread
|
||
(`/proc/1/task/118/stack` was `permission denied`).
|
||
- A genuine MySQL 8.4.9 bug in the SRS-update path is possible (worth
|
||
checking the MySQL bug tracker before retry).
|
||
|
||
**Organizational root cause** (definitive): the `mysql:8.4` floating
|
||
tag let Keel auto-bump without testing. Already fixed — image pinned
|
||
to `mysql:8.4.8` exactly.
|
||
|
||
## Decisions
|
||
|
||
| # | Decision | Notes |
|
||
|---|----------|-------|
|
||
| 1 | **Approach: wipe + re-init on 8.4.9** (logical migration via fresh init + dump-restore) | The DD upgrade is the broken path. A fresh 8.4.9 init starts at version 80409 directly — no upgrade ever runs. We've executed wipe+restore once in ~25 min; the path is now well-trodden. |
|
||
| 2 | **Pre-flight: bump InnoDB IO config** | `innodb_io_capacity=2000`, `innodb_io_capacity_max=4000`, `innodb_page_cleaners=4`. These are the long-term-correct values regardless of the upgrade — current settings are ~10× too conservative for the workload. |
|
||
| 3 | **Restore strategy: per-database dumps, NOT the full `--all-databases` dump** | Per-db dumps at `/srv/nfs/mysql-backup/per-db/<db>/` skip the `mysql` system schema entirely. Avoids the question of "will 8.4.8 mysql-schema rows confuse 8.4.9". User accounts get recreated via Vault + null_resource. |
|
||
| 4 | **Fresh dump immediately before cutover, not yesterday's** | The daily dump runs at 00:30 UTC. The cutover dump must come from < 60 s before scale-to-0 to minimize data loss. Kick `mysql-backup-per-db` CronJob manually. |
|
||
| 5 | **Maintenance window required** | All MySQL-dependent apps offline ~25 min: Forgejo (+ registry → ImagePullBackOff cascade), Nextcloud, HackMD, Grafana, Paperless, Uptime-Kuma, Shlink, realestate-crawler, phpipam, technitium, vikunja, freshrss, finance, resume. Pick a low-traffic window (suggest Sunday 03:00 UK). |
|
||
| 6 | **Single rollback path: re-pin to 8.4.8 + same wipe/restore flow** | If 8.4.9 fresh init misbehaves post-restore, rollback IS the same procedure, just with image=8.4.8. The pinned 8.4.8 dump survives. No new failure modes. |
|
||
| 7 | **Out of scope for this upgrade**: tuning that doesn't gate the upgrade | Right-sizing buffer pool, switching to async commits, changing storage class, replication — all separate decisions. |
|
||
|
||
## Verification gates
|
||
|
||
Before declaring done:
|
||
1. `kubectl -n dbaas exec mysql-standalone-0 -- mysql -uroot -p"$PW" -e "SELECT VERSION();"` returns `8.4.9`.
|
||
2. `SHOW DATABASES;` lists all 20 user databases.
|
||
3. Table count per schema matches the pre-upgrade snapshot (recorded
|
||
in step 1 of the plan).
|
||
4. `forgejo` logs show successful DB ping; `kubectl -n forgejo get pod` is 1/1 Running.
|
||
5. `kubectl get deploy,sts -A` shows no unready workloads.
|
||
6. `bash infra/scripts/cluster_healthcheck.sh --quiet` returns same or
|
||
better PASS/WARN/FAIL ratio as pre-upgrade.
|
||
7. Forgejo integrity probe reports 0 failures (manual trigger).
|
||
8. `RegistryCatalogInaccessible` not firing in Prometheus.
|
||
|
||
## Risks + mitigations
|
||
|
||
| Risk | Likelihood | Mitigation |
|
||
|---|---|---|
|
||
| 8.4.9 fresh init has *some other* unobserved bug | Low | Smoke-test on a parallel PVC in dbaas before touching the real one (optional but cheap — adds 30 min). See plan Phase 1. |
|
||
| Per-db dump-restore misses a database the user added recently | Low | Compare `SHOW DATABASES` against the per-db dump directory listing pre-cutover. If a DB exists in MySQL but not in `/srv/nfs/mysql-backup/per-db/`, dump it manually first. |
|
||
| Forgejo/roundcubemail static-user passwords drift again after restore | Certain | Already documented in runbook — DROP USER + CREATE USER from Vault values immediately after restore. |
|
||
| The cutover dump itself is corrupt | Very low | mysqldump exits non-zero on failure. CronJob already pushes `backup_last_success_timestamp` to Pushgateway. Verify timestamp is fresh before proceeding. |
|
||
| Apps fail to reconnect after MySQL restart | Low | Already-proven recipe: `kubectl rollout restart` on the affected deployments. Listed exhaustively in runbook §B.8. |
|
||
| 8.4.9 fresh init *also* stalls (root cause was NOT flush starvation) | Medium-low | Pre-flight test on parallel PVC catches this before maintenance window. If real prod init stalls, immediately revert TF pin to 8.4.8, redo same dump-restore flow. Same 25 min downtime as the original recovery. |
|
||
|
||
## Why not alternatives
|
||
|
||
- **In-place DD upgrade with bumped IO config**: simpler, but if it
|
||
still stalls we lose 30–60 min waiting + still fall back to
|
||
wipe+restore. Same data risk; worse expected time. We *would* learn
|
||
whether the bumped IO settings fix the upgrade, but the fresh init
|
||
approach makes that knowledge unnecessary.
|
||
- **Parallel migration (new mysql-standalone-new pod alongside)**:
|
||
cleanest rollback (instant via service-selector flip), but needs TF
|
||
surgery to declare two StatefulSets temporarily and isn't worth the
|
||
complexity when the wipe+restore approach is now proven.
|
||
- **Wait for 8.4.10 / 8.5 LTS**: leaves us stuck on 8.4.8 indefinitely.
|
||
Acceptable for now (we're pinned), but not a permanent answer.
|
||
|
||
## Out of scope
|
||
|
||
- A standby/replica MySQL for zero-downtime upgrades (separate
|
||
initiative — see future planning around CNPG-style HA for MySQL).
|
||
- Removing `proxmox-lvm-encrypted` LUKS2 from the equation (the
|
||
encryption is a security requirement; debugging its flush latency is
|
||
separate).
|
||
- Replacing MySQL with PostgreSQL (long-term goal for some apps; not
|
||
this upgrade).
|