apply-mbps-caps: compare normalized option sets (true idempotency) + devvm I/O-stall post-mortem [ci skip]
The raw string compare never matched qm config's canonical key order, so the hourly timer re-issued 'qm set' against every running capped VM, live-rewriting QEMU throttle state via QMP 24x/day. Implicated in today's devvm freeze (15:21-16:48 UTC): the guest's disk I/O stalled inside QEMU (blockstats frozen at 0 while QMP stayed responsive) on the legacy lsi controller path with no iothread. Viktor asked to root-cause the freeze before choosing fixes, then approved mitigating via VM settings: this commit fixes the hourly trigger and documents the incident; the controller swap (virtio-scsi-single + iothread=1 + aio=threads) is staged on VM 102 separately, pending his cold stop/start. Adds docs/post-mortems/2026-06-11-devvm-qemu-io-stall.md (evidence chain, ruled-out causes, capture-before-kill autopsy steps) and syncs compute.md + proxmox-inventory.md. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
2e0cebff87
commit
c3a63fcd38
4 changed files with 136 additions and 4 deletions
|
|
@ -101,7 +101,12 @@ graph TB
|
|||
> PVE host (sources in `infra/scripts/`, install pattern per
|
||||
> `architecture/backup-dr.md`). Timer fires `OnBootSec=5min` +
|
||||
> `OnCalendar=hourly`, so any drift (config restore, manual `qm
|
||||
> set`, fresh clone) self-heals within the hour. Current caps:
|
||||
> set`, fresh clone) self-heals within the hour. The script compares
|
||||
> *normalized option sets*, so an unchanged config is a true no-op —
|
||||
> until 2026-06-11 a raw string compare (defeated by `qm config`'s
|
||||
> canonical key order) re-issued `qm set` hourly against running VMs,
|
||||
> live-rewriting QEMU throttle state via QMP (implicated in the devvm
|
||||
> I/O stall; see `post-mortems/2026-06-11-devvm-qemu-io-stall.md`). Current caps:
|
||||
> 102 devvm 60/60, 103 home-assistant 40/40, 200 k8s-master 100/60,
|
||||
> 201 k8s-node1 150/120, 202 k8s-node2 150/120, 203 k8s-node3 150/120,
|
||||
> 204 k8s-node4 150/120, 220 docker-registry 40/40.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue