fan-control: power-tune COOL curve to the 60% efficiency knee

Power/temp sweep (2026-06-05) located the cooling-per-watt knee at ~60%:
60->70% buys only -2C for +21W, and 70->100% buys 0C for +54W (the CPU
floors ~59C at cluster load, so more airflow does nothing). Re-tune the
COOL curve to cap its normal band at 60% (~303W, ~61C); 80/100% become a
high-load safety ramp (>=73/79C) before the 83C ceiling. QUIET unchanged
(already at the 281W / 4800rpm floor). Saves up to ~75W (~650 kWh/yr) vs
full-tilt for the last ~2C. Tests + design doc updated; verified live
(63C, 60%, ~267W).

[ci skip]

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-05 06:43:21 +00:00
parent 17da37cea3
commit 99f9bf8d89
3 changed files with 54 additions and 24 deletions

View file

@ -28,6 +28,31 @@ At a comparable CPU load (~4553 % busy):
Best °C-per-RPM is the first step; beyond ~70 % it is mostly noise. ~16°C of
swing is available.
## Power characterization (sweep 2026-06-05)
Averaged wall power (iDRAC DCMI) + temp at each fan setting:
| Fan | RPM | Power | CPU | load |
|-----|-----|-------|-----|------|
| auto | 7,080 | 296 W | 68°C | 21 |
| 20 % | 4,800 | 281 W | 73°C | 20 |
| 30 % | 6,360 | 288 W | 72°C | 19 |
| 50 % | 9,360 | 299 W | 65°C | 18 |
| 60 % | 11,040 | 303 W | 61°C | 17 |
| 70 % | 12,720 | 324 W | 59°C | 16 |
| 100 % | 16,920 | 378 W | 59°C | 17 |
**The cooling-per-watt knee is ~60 %.** Fan power follows ~RPM³: 60→70 % costs
+21 W for 2°C; 70→100 % costs **+54 W for 0°C** (the CPU floors ~59°C at cluster
load — more airflow does nothing). Full speed draws ~97 W (~850 kWh/yr) over the
floor and buys nothing past 60 %.
**Decision (2026-06-05):** the COOL curve caps its normal band at 60 % (~303 W,
~61°C) — capturing essentially all achievable cooling while avoiding the wasteful
80100 % zone, now reserved as a high-load safety ramp (≥73/79°C) before the 83°C
ceiling. QUIET is unchanged (already at the low-power floor: 20 % / 4,800 RPM /
281 W). Verified live after re-tune: 63°C, 60 %, ~267 W.
## Decisions
1. **Custom bash daemon + systemd service**, deployed to the PVE host the same
@ -45,15 +70,16 @@ swing is available.
`HOLD_SECS` (15 min) ⇒ someone's around ⇒ QUIET; otherwise COOL.
`house_mode` was rejected — it tracks *apartment* occupancy, irrelevant to
garage noise.
4. **Two curves**, picked by presence:
4. **Two curves**, picked by presence (COOL power-tuned 2026-06-05 — see
"Power characterization" below):
| CPU °C | COOL % (empty) | CPU °C | QUIET % (occupied) |
|--------|----------------|--------|--------------------|
| ≤52 | 25 | ≤72 | 20 (≈silent floor) |
| 5360 | 45 | 7377 | 40 |
| 6167 | 65 | 7881 | 65 |
| 6873 | 85 | ≥82 | 100 |
| ≥74 | 100 | | |
| ≤54 | 30 | ≤72 | 20 (≈silent floor) |
| 5563 | 50 | 7377 | 40 |
| 6472 | 60 (knee) | 7881 | 65 |
| 7378 | 80 | ≥82 | 100 |
| ≥79 | 100 | | |
3°C downward hysteresis prevents flapping at band edges (ramp up immediately,
step down only once the curve still wants lower 3°C hotter).

View file

@ -43,7 +43,11 @@ set -uo pipefail
: "${RUN_ONCE:=0}" # 1 => one iteration then exit (testing)
# Curves as "min_temp:pct" entries, descending; first whose min_temp <= temp wins.
COOL_CURVE=(74:100 68:85 61:65 53:45 0:25)
# COOL is power-tuned (2026-06-05 power/temp sweep): the cooling-per-watt knee is
# ~60% — beyond it airflow buys almost nothing (60->70% = +21W/-2°C, 70->100% =
# +54W/0°C; the CPU floors ~59°C at cluster load). So the normal band caps at 60%
# (~303W, ~61°C); 80/100% are a high-load safety ramp before the 83°C ceiling.
COOL_CURVE=(79:100 73:80 64:60 55:50 0:30)
QUIET_CURVE=(82:100 78:65 73:40 0:20)
log() { printf '%s %s\n' "$(date '+%Y-%m-%dT%H:%M:%S%z')" "$*"; }

View file

@ -15,16 +15,16 @@ eq() { # <description> <expected> <actual>
fi
}
# --- COOL curve ---
eq "cool 40 -> 25" 25 "$(fc_curve cool 40)"
eq "cool 52 -> 25" 25 "$(fc_curve cool 52)"
eq "cool 53 -> 45" 45 "$(fc_curve cool 53)"
eq "cool 60 -> 45" 45 "$(fc_curve cool 60)"
eq "cool 61 -> 65" 65 "$(fc_curve cool 61)"
eq "cool 67 -> 65" 65 "$(fc_curve cool 67)"
eq "cool 68 -> 85" 85 "$(fc_curve cool 68)"
eq "cool 73 -> 85" 85 "$(fc_curve cool 73)"
eq "cool 74 -> 100" 100 "$(fc_curve cool 74)"
# --- COOL curve (power-tuned 2026-06-05: knee at 60%) ---
eq "cool 40 -> 30" 30 "$(fc_curve cool 40)"
eq "cool 54 -> 30" 30 "$(fc_curve cool 54)"
eq "cool 55 -> 50" 50 "$(fc_curve cool 55)"
eq "cool 63 -> 50" 50 "$(fc_curve cool 63)"
eq "cool 64 -> 60" 60 "$(fc_curve cool 64)"
eq "cool 72 -> 60" 60 "$(fc_curve cool 72)"
eq "cool 73 -> 80" 80 "$(fc_curve cool 73)"
eq "cool 78 -> 80" 80 "$(fc_curve cool 78)"
eq "cool 79 -> 100" 100 "$(fc_curve cool 79)"
eq "cool 91 -> 100" 100 "$(fc_curve cool 91)"
# --- QUIET curve ---
@ -37,13 +37,13 @@ eq "quiet 81 -> 65" 65 "$(fc_curve quiet 81)"
eq "quiet 82 -> 100" 100 "$(fc_curve quiet 82)"
# --- decide: hysteresis ---
eq "decide uninit -> target" 85 "$(fc_decide cool 68 -1 3)"
eq "decide ramp up now" 85 "$(fc_decide cool 68 25 3)"
eq "decide equal holds" 65 "$(fc_decide cool 65 65 3)"
eq "decide down held in band" 85 "$(fc_decide cool 67 85 3)" # 67+3=70 still 85% -> hold
eq "decide down past band" 65 "$(fc_decide cool 64 85 3)" # 64+3=67 -> 65% < 85 -> drop
eq "decide 100 holds at 71" 100 "$(fc_decide cool 71 100 3)" # 71+3=74 -> 100 -> hold
eq "decide 100 drops at 70" 85 "$(fc_decide cool 70 100 3)" # 70+3=73 -> 85 < 100 -> drop
eq "decide uninit -> target" 60 "$(fc_decide cool 68 -1 3)"
eq "decide ramp up now" 60 "$(fc_decide cool 68 25 3)"
eq "decide equal holds" 60 "$(fc_decide cool 64 60 3)"
eq "decide down held in band" 80 "$(fc_decide cool 70 80 3)" # 70+3=73 still 80% -> hold
eq "decide down past band" 60 "$(fc_decide cool 69 80 3)" # 69+3=72 -> 60% < 80 -> drop
eq "decide 100 holds" 100 "$(fc_decide cool 77 100 3)" # 77+3=80 -> 100 -> hold
eq "decide 100 drops" 80 "$(fc_decide cool 75 100 3)" # 75+3=78 -> 80 < 100 -> drop
# --- presence ---
now=1000000