Power/temp sweep (2026-06-05) located the cooling-per-watt knee at ~60%:
60->70% buys only -2C for +21W, and 70->100% buys 0C for +54W (the CPU
floors ~59C at cluster load, so more airflow does nothing). Re-tune the
COOL curve to cap its normal band at 60% (~303W, ~61C); 80/100% become a
high-load safety ramp (>=73/79C) before the 83C ceiling. QUIET unchanged
(already at the 281W / 4800rpm floor). Saves up to ~75W (~650 kWh/yr) vs
full-tilt for the last ~2C. Tests + design doc updated; verified live
(63C, 60%, ~267W).
[ci skip]
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The iDRAC stock curve runs the CPU at ~72°C on the 7080 RPM floor even
under load (optimises for quiet, not cool). Add a bash daemon + systemd
unit that drives the chassis fans from CPU temp on two curves, picked by
garage occupancy (the server is in the garage): COOL when empty
(measured ~58-65°C under load), QUIET near the silent floor when the
ha-sofia garage door shows someone is there (open, or <15min since last
activity).
Manual fan mode is backstopped: bash EXIT trap + systemd ExecStopPost
hand fans back to Dell auto on stop/crash; CPU>=83°C or repeated IPMI
failures do the same. Pushgateway metrics (job=fan_control). 36 unit
tests cover the pure curve/hysteresis/presence/parse logic; DRY_RUN +
RUN_ONCE for integration checks. Deployed and verified on 192.168.1.127
(CPU 70->58°C in cool mode, hysteresis stepping confirmed).
Design: docs/plans/2026-06-04-pve-fan-control-design.md
Runbook: docs/runbooks/fan-control.md
[ci skip]
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>