fan-control: power-tune COOL curve to the 60% efficiency knee

Power/temp sweep (2026-06-05) located the cooling-per-watt knee at ~60%:
60->70% buys only -2C for +21W, and 70->100% buys 0C for +54W (the CPU
floors ~59C at cluster load, so more airflow does nothing). Re-tune the
COOL curve to cap its normal band at 60% (~303W, ~61C); 80/100% become a
high-load safety ramp (>=73/79C) before the 83C ceiling. QUIET unchanged
(already at the 281W / 4800rpm floor). Saves up to ~75W (~650 kWh/yr) vs
full-tilt for the last ~2C. Tests + design doc updated; verified live
(63C, 60%, ~267W).

[ci skip]

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-05 06:43:21 +00:00
parent 17da37cea3
commit 99f9bf8d89
3 changed files with 54 additions and 24 deletions

View file

@ -28,6 +28,31 @@ At a comparable CPU load (~4553 % busy):
Best °C-per-RPM is the first step; beyond ~70 % it is mostly noise. ~16°C of
swing is available.
## Power characterization (sweep 2026-06-05)
Averaged wall power (iDRAC DCMI) + temp at each fan setting:
| Fan | RPM | Power | CPU | load |
|-----|-----|-------|-----|------|
| auto | 7,080 | 296 W | 68°C | 21 |
| 20 % | 4,800 | 281 W | 73°C | 20 |
| 30 % | 6,360 | 288 W | 72°C | 19 |
| 50 % | 9,360 | 299 W | 65°C | 18 |
| 60 % | 11,040 | 303 W | 61°C | 17 |
| 70 % | 12,720 | 324 W | 59°C | 16 |
| 100 % | 16,920 | 378 W | 59°C | 17 |
**The cooling-per-watt knee is ~60 %.** Fan power follows ~RPM³: 60→70 % costs
+21 W for 2°C; 70→100 % costs **+54 W for 0°C** (the CPU floors ~59°C at cluster
load — more airflow does nothing). Full speed draws ~97 W (~850 kWh/yr) over the
floor and buys nothing past 60 %.
**Decision (2026-06-05):** the COOL curve caps its normal band at 60 % (~303 W,
~61°C) — capturing essentially all achievable cooling while avoiding the wasteful
80100 % zone, now reserved as a high-load safety ramp (≥73/79°C) before the 83°C
ceiling. QUIET is unchanged (already at the low-power floor: 20 % / 4,800 RPM /
281 W). Verified live after re-tune: 63°C, 60 %, ~267 W.
## Decisions
1. **Custom bash daemon + systemd service**, deployed to the PVE host the same
@ -45,15 +70,16 @@ swing is available.
`HOLD_SECS` (15 min) ⇒ someone's around ⇒ QUIET; otherwise COOL.
`house_mode` was rejected — it tracks *apartment* occupancy, irrelevant to
garage noise.
4. **Two curves**, picked by presence:
4. **Two curves**, picked by presence (COOL power-tuned 2026-06-05 — see
"Power characterization" below):
| CPU °C | COOL % (empty) | CPU °C | QUIET % (occupied) |
|--------|----------------|--------|--------------------|
| ≤52 | 25 | ≤72 | 20 (≈silent floor) |
| 5360 | 45 | 7377 | 40 |
| 6167 | 65 | 7881 | 65 |
| 6873 | 85 | ≥82 | 100 |
| ≥74 | 100 | | |
| ≤54 | 30 | ≤72 | 20 (≈silent floor) |
| 5563 | 50 | 7377 | 40 |
| 6472 | 60 (knee) | 7881 | 65 |
| 7378 | 80 | ≥82 | 100 |
| ≥79 | 100 | | |
3°C downward hysteresis prevents flapping at band edges (ramp up immediately,
step down only once the curve still wants lower 3°C hotter).