A manual/cool/quiet override in HA auto-reverts to `auto` after 60 min. Add a
Fan Lock (`input_boolean.r730_fan_lock`) that gates that automation so a
deliberate override persists, with a visible "🔒 FAN CONTROL LOCKED" banner on
the dashboard-it Server view so it isn't forgotten. The automation re-checks the
lock after the hour (locking mid-countdown cancels the revert) and the 83 °C
ceiling still wins. HA-side only (helper + automation + dashboard live on
ha-sofia, auto-git-tracked there); these docs are the infra-repo record.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5.6 KiB
Runbook — PVE R730 fan-control daemon
Presence-aware IPMI fan controller on the PVE host (192.168.1.127). Runs the
CPU cool when the garage is empty, quiet when someone's in the garage. Design:
infra/docs/plans/2026-06-04-pve-fan-control-design.md.
What it is
/usr/local/bin/fan-control— bash daemon (source:infra/scripts/fan-control.sh).fan-control.service— systemd unit (Type=simple, restarts on failure)./etc/fan-control.env— config incl. the ha-sofia token (chmod 600, not in git).
HA control (Home Assistant)
The daemon polls two ha-sofia helpers each loop, so you can drive the fans from HA — dashboard-it → "Server" view → Fans:
input_select.r730_fan_mode— auto (garage-presence curve, default), cool / quiet (force that curve), manual (hold a fixed %).input_number.r730_fan_manual_pct— the % used inmanualmode (slider).input_boolean.r730_fan_lock— lock the current override so the 60-min auto-revert leaves it alone (a 🔒 banner shows on the view while engaged).
Any non-auto override auto-reverts to auto after 60 min
(automation.r730_fan_mode_auto_revert on ha-sofia), so a forgotten override
can't run the fans wrong indefinitely — unless you engage the Fan Lock
(input_boolean.r730_fan_lock, toggle on the same view). While locked the
override persists indefinitely and a "🔒 FAN CONTROL LOCKED" banner appears on
the view so you remember to unlock; unlocking restarts the 60-min timer. The
automation re-checks the lock after the hour, so locking mid-countdown also
cancels the pending revert. CEILING (83 °C) still overrides everything → Dell
auto — the lock does not defeat the ceiling. An HA change is applied within
one daemon loop (~15 s).
Monitoring sensors on the same view: sensor.r730_fan_speed (redfish exporter),
sensor.r730_fan_control_target + sensor.r730_fan_control_mode +
sensor.r730_fan_power_est (Pushgateway). r730_fan_power_est is an ESTIMATE of
total fan power (the iDRAC reports no per-fan power) — modelled from RPM via the
fan affinity law (∝ RPM³), calibrated to the power sweep (~2 W floor → ~99 W full).
The HA objects (helpers, the auto-revert automation, the REST sensors in
rest_resources/{idrac_redfish_exporter,fan_control}.yaml, and the dashboard
cards) live on ha-sofia and are auto-git-tracked there by the version-control
add-on — they are NOT in this repo.
Quick status
ssh root@192.168.1.127 systemctl status fan-control
ssh root@192.168.1.127 'journalctl -u fan-control -n 30 --no-pager'
ssh root@192.168.1.127 'ipmitool sdr type fan | grep ^Fan1; ipmitool sdr type temperature | grep "^Temp "'
Log lines look like temp=60C ha_mode=auto eff=cool fan=50% (was 70%)
(ha_mode = the HA setpoint; eff = the effective curve applied).
Disable / roll back to stock firmware control
ssh root@192.168.1.127 'systemctl disable --now fan-control && ipmitool raw 0x30 0x30 0x01 0x01'
The unit's ExecStopPost already restores Dell auto on stop, so the explicit
raw ... 0x01 is belt-and-suspenders. The box is back to its stock curve.
Tune
Edit /etc/fan-control.env on the host, then systemctl restart fan-control.
Common knobs:
HOLD_SECS— how long to stay quiet after the garage door last moved (default 900 = 15 min).CEILING— temp at which we abandon manual control and let the firmware take over (default 83).- Curve shape: linear anchors near the top of the script —
COOL_T_LO/COOL_P_LO/COOL_T_HI/COOL_P_HI(default 50°C/30% → 83°C/100%) andQUIET_*(68°C/20% → 83°C/100%); fan% interpolates linearly between them (replaced the old discrete step-bands).MIN_STEP(default 3%) = smallest fan-% change worth an IPMI write (anti-jitter);DEADBAND(3°C) = ease-down hysteresis. LowerCOOL_P_HIor raiseCOOL_T_HIto run the top end quieter; steepen by raisingCOOL_P_LO/ loweringCOOL_T_LO.
Deploy / update
cd infra
scp scripts/fan-control.sh root@192.168.1.127:/usr/local/bin/fan-control
ssh root@192.168.1.127 chmod +x /usr/local/bin/fan-control
scp scripts/fan-control.service root@192.168.1.127:/etc/systemd/system/fan-control.service
# first install only — create /etc/fan-control.env from fan-control.env.example with the HA token
ssh root@192.168.1.127 'systemctl daemon-reload && systemctl restart fan-control'
HA token
/etc/fan-control.env holds a long-lived ha-sofia token used to read
sensor.garage_door_state_bg. Mint via Home Assistant → Profile → Security →
Long-lived access tokens, or reuse the existing ha-sofia token. If the token is
missing/empty, the daemon still runs but COOL-only (no quiet mode) and logs
ha_reachable=0.
Symptoms & checks
| Symptom | Check |
|---|---|
| Fans stuck loud | journalctl -u fan-control — is mode=fallback? (ceiling breach or IPMI fail). Check CPU temp. |
| Never goes quiet | Token valid? curl -H "Authorization: Bearer $TOKEN" http://192.168.1.8:8123/api/states/sensor.garage_door_state_bg. Garage door reporting? |
| Fans flapping | Increase DEADBAND. |
| Service won't start | systemctl status fan-control; check ipmitool works: ipmitool sdr type temperature. |
| Box left in manual after crash | ipmitool raw 0x30 0x30 0x01 0x01 to force Dell auto. |
Verify presence wiring
# one iteration, real IPMI + HA, no daemon loop:
ssh root@192.168.1.127 'set -a; . /etc/fan-control.env; set +a; RUN_ONCE=1 /usr/local/bin/fan-control'
With the garage closed for >15 min you should see mode=cool; within 15 min of
the door moving, mode=quiet.