From 99f9bf8d89b7f53e0e145d3ff18ffeafc84036bb Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Fri, 5 Jun 2026 06:43:21 +0000 Subject: [PATCH] fan-control: power-tune COOL curve to the 60% efficiency knee Power/temp sweep (2026-06-05) located the cooling-per-watt knee at ~60%: 60->70% buys only -2C for +21W, and 70->100% buys 0C for +54W (the CPU floors ~59C at cluster load, so more airflow does nothing). Re-tune the COOL curve to cap its normal band at 60% (~303W, ~61C); 80/100% become a high-load safety ramp (>=73/79C) before the 83C ceiling. QUIET unchanged (already at the 281W / 4800rpm floor). Saves up to ~75W (~650 kWh/yr) vs full-tilt for the last ~2C. Tests + design doc updated; verified live (63C, 60%, ~267W). [ci skip] Co-Authored-By: Claude Opus 4.8 --- .../2026-06-04-pve-fan-control-design.md | 38 ++++++++++++++++--- scripts/fan-control.sh | 6 ++- scripts/test-fan-control.sh | 34 ++++++++--------- 3 files changed, 54 insertions(+), 24 deletions(-) diff --git a/docs/plans/2026-06-04-pve-fan-control-design.md b/docs/plans/2026-06-04-pve-fan-control-design.md index ed4c2ae7..004cafff 100644 --- a/docs/plans/2026-06-04-pve-fan-control-design.md +++ b/docs/plans/2026-06-04-pve-fan-control-design.md @@ -28,6 +28,31 @@ At a comparable CPU load (~45–53 % busy): Best °C-per-RPM is the first step; beyond ~70 % it is mostly noise. ~16°C of swing is available. +## Power characterization (sweep 2026-06-05) + +Averaged wall power (iDRAC DCMI) + temp at each fan setting: + +| Fan | RPM | Power | CPU | load | +|-----|-----|-------|-----|------| +| auto | 7,080 | 296 W | 68°C | 21 | +| 20 % | 4,800 | 281 W | 73°C | 20 | +| 30 % | 6,360 | 288 W | 72°C | 19 | +| 50 % | 9,360 | 299 W | 65°C | 18 | +| 60 % | 11,040 | 303 W | 61°C | 17 | +| 70 % | 12,720 | 324 W | 59°C | 16 | +| 100 % | 16,920 | 378 W | 59°C | 17 | + +**The cooling-per-watt knee is ~60 %.** Fan power follows ~RPM³: 60→70 % costs ++21 W for −2°C; 70→100 % costs **+54 W for 0°C** (the CPU floors ~59°C at cluster +load — more airflow does nothing). Full speed draws ~97 W (~850 kWh/yr) over the +floor and buys nothing past 60 %. + +**Decision (2026-06-05):** the COOL curve caps its normal band at 60 % (~303 W, +~61°C) — capturing essentially all achievable cooling while avoiding the wasteful +80–100 % zone, now reserved as a high-load safety ramp (≥73/79°C) before the 83°C +ceiling. QUIET is unchanged (already at the low-power floor: 20 % / 4,800 RPM / +281 W). Verified live after re-tune: 63°C, 60 %, ~267 W. + ## Decisions 1. **Custom bash daemon + systemd service**, deployed to the PVE host the same @@ -45,15 +70,16 @@ swing is available. `HOLD_SECS` (15 min) ⇒ someone's around ⇒ QUIET; otherwise COOL. `house_mode` was rejected — it tracks *apartment* occupancy, irrelevant to garage noise. -4. **Two curves**, picked by presence: +4. **Two curves**, picked by presence (COOL power-tuned 2026-06-05 — see + "Power characterization" below): | CPU °C | COOL % (empty) | CPU °C | QUIET % (occupied) | |--------|----------------|--------|--------------------| - | ≤52 | 25 | ≤72 | 20 (≈silent floor) | - | 53–60 | 45 | 73–77 | 40 | - | 61–67 | 65 | 78–81 | 65 | - | 68–73 | 85 | ≥82 | 100 | - | ≥74 | 100 | | | + | ≤54 | 30 | ≤72 | 20 (≈silent floor) | + | 55–63 | 50 | 73–77 | 40 | + | 64–72 | 60 (knee) | 78–81 | 65 | + | 73–78 | 80 | ≥82 | 100 | + | ≥79 | 100 | | | 3°C downward hysteresis prevents flapping at band edges (ramp up immediately, step down only once the curve still wants lower 3°C hotter). diff --git a/scripts/fan-control.sh b/scripts/fan-control.sh index 95942635..a0564d4d 100644 --- a/scripts/fan-control.sh +++ b/scripts/fan-control.sh @@ -43,7 +43,11 @@ set -uo pipefail : "${RUN_ONCE:=0}" # 1 => one iteration then exit (testing) # Curves as "min_temp:pct" entries, descending; first whose min_temp <= temp wins. -COOL_CURVE=(74:100 68:85 61:65 53:45 0:25) +# COOL is power-tuned (2026-06-05 power/temp sweep): the cooling-per-watt knee is +# ~60% — beyond it airflow buys almost nothing (60->70% = +21W/-2°C, 70->100% = +# +54W/0°C; the CPU floors ~59°C at cluster load). So the normal band caps at 60% +# (~303W, ~61°C); 80/100% are a high-load safety ramp before the 83°C ceiling. +COOL_CURVE=(79:100 73:80 64:60 55:50 0:30) QUIET_CURVE=(82:100 78:65 73:40 0:20) log() { printf '%s %s\n' "$(date '+%Y-%m-%dT%H:%M:%S%z')" "$*"; } diff --git a/scripts/test-fan-control.sh b/scripts/test-fan-control.sh index 1246f588..be520130 100644 --- a/scripts/test-fan-control.sh +++ b/scripts/test-fan-control.sh @@ -15,16 +15,16 @@ eq() { # fi } -# --- COOL curve --- -eq "cool 40 -> 25" 25 "$(fc_curve cool 40)" -eq "cool 52 -> 25" 25 "$(fc_curve cool 52)" -eq "cool 53 -> 45" 45 "$(fc_curve cool 53)" -eq "cool 60 -> 45" 45 "$(fc_curve cool 60)" -eq "cool 61 -> 65" 65 "$(fc_curve cool 61)" -eq "cool 67 -> 65" 65 "$(fc_curve cool 67)" -eq "cool 68 -> 85" 85 "$(fc_curve cool 68)" -eq "cool 73 -> 85" 85 "$(fc_curve cool 73)" -eq "cool 74 -> 100" 100 "$(fc_curve cool 74)" +# --- COOL curve (power-tuned 2026-06-05: knee at 60%) --- +eq "cool 40 -> 30" 30 "$(fc_curve cool 40)" +eq "cool 54 -> 30" 30 "$(fc_curve cool 54)" +eq "cool 55 -> 50" 50 "$(fc_curve cool 55)" +eq "cool 63 -> 50" 50 "$(fc_curve cool 63)" +eq "cool 64 -> 60" 60 "$(fc_curve cool 64)" +eq "cool 72 -> 60" 60 "$(fc_curve cool 72)" +eq "cool 73 -> 80" 80 "$(fc_curve cool 73)" +eq "cool 78 -> 80" 80 "$(fc_curve cool 78)" +eq "cool 79 -> 100" 100 "$(fc_curve cool 79)" eq "cool 91 -> 100" 100 "$(fc_curve cool 91)" # --- QUIET curve --- @@ -37,13 +37,13 @@ eq "quiet 81 -> 65" 65 "$(fc_curve quiet 81)" eq "quiet 82 -> 100" 100 "$(fc_curve quiet 82)" # --- decide: hysteresis --- -eq "decide uninit -> target" 85 "$(fc_decide cool 68 -1 3)" -eq "decide ramp up now" 85 "$(fc_decide cool 68 25 3)" -eq "decide equal holds" 65 "$(fc_decide cool 65 65 3)" -eq "decide down held in band" 85 "$(fc_decide cool 67 85 3)" # 67+3=70 still 85% -> hold -eq "decide down past band" 65 "$(fc_decide cool 64 85 3)" # 64+3=67 -> 65% < 85 -> drop -eq "decide 100 holds at 71" 100 "$(fc_decide cool 71 100 3)" # 71+3=74 -> 100 -> hold -eq "decide 100 drops at 70" 85 "$(fc_decide cool 70 100 3)" # 70+3=73 -> 85 < 100 -> drop +eq "decide uninit -> target" 60 "$(fc_decide cool 68 -1 3)" +eq "decide ramp up now" 60 "$(fc_decide cool 68 25 3)" +eq "decide equal holds" 60 "$(fc_decide cool 64 60 3)" +eq "decide down held in band" 80 "$(fc_decide cool 70 80 3)" # 70+3=73 still 80% -> hold +eq "decide down past band" 60 "$(fc_decide cool 69 80 3)" # 69+3=72 -> 60% < 80 -> drop +eq "decide 100 holds" 100 "$(fc_decide cool 77 100 3)" # 77+3=80 -> 100 -> hold +eq "decide 100 drops" 80 "$(fc_decide cool 75 100 3)" # 75+3=78 -> 80 < 100 -> drop # --- presence --- now=1000000