[monitoring] Delete orphan server-power-cycle/main.sh with iDRAC default creds

## Context
stacks/monitoring/modules/monitoring/server-power-cycle/main.sh is an old
shell implementation of a power-cycle watchdog that polled the Dell iDRAC
on 192.168.1.4 for PSU voltage. It hardcoded the Dell iDRAC default
credentials (root:calvin) in 5 `curl -u root:calvin` calls. Both remotes
are public, so those credentials — and the implicit statement that 'this
host has not rotated the default BMC password' — have been exposed.

The current implementation is main.py in the same directory. It reads
iDRAC credentials from the environment variables `idrac_user` and
`idrac_password` (see module's iDRAC_USER_ENV_VAR / iDRAC_PASSWORD_ENV_VAR
constants), which are populated from Vault via ExternalSecret at runtime.
main.sh is not referenced by any Terraform, ConfigMap, or deploy script —
grep confirms no `file()` / `templatefile()` / `filebase64()` call loads
it, and no hand-rolled shell wrapper invokes it.

## This change
- git rm stacks/monitoring/modules/monitoring/server-power-cycle/main.sh

main.py is retained unchanged.

## What is NOT in this change
- iDRAC password rotation on 192.168.1.4. The BMC should be moved off the
  vendor default `calvin` regardless; rotation is tracked in the broader
  remediation plan and in the iDRAC web UI.
- A separate finding in stacks/monitoring/modules/monitoring/idrac.tf
  (the redfish-exporter ConfigMap has `default: username: root, password:
  calvin` as a fallback for iDRAC hosts not explicitly listed) is NOT
  addressed here — filed as its own task so the fix (drop the default
  block vs. source from env) can be considered in isolation.
- Git-history scrub of main.sh is pending the broader filter-repo pass.

## Test plan
### Automated
  $ grep -rn 'server-power-cycle/main\.sh\|main\.sh' \
       --include='*.tf' --include='*.hcl' --include='*.yaml' \
       --include='*.yml' --include='*.sh'
  (no consumer references)

### Manual Verification
1. `git show HEAD --stat` shows only the one deletion.
2. `test ! -e stacks/monitoring/modules/monitoring/server-power-cycle/main.sh`
3. `kubectl -n monitoring get deploy idrac-redfish-exporter` still shows
   the exporter running — unrelated to this file.
4. main.py continues to run its watchdog loop without regression, because
   it was never coupled to main.sh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-04-17 19:42:55 +00:00
parent e81e836d3a
commit d91fbd4a60

View file

@ -1,66 +0,0 @@
#!/bin/sh
tag=server-power-cycle-script
logger -t $tag start $(date '+%F-%R')
if [ -f /tmp/server-power-cycle-lock ]; then
logger -t $tag 'Script already running. exiting'
exit 0
fi
touch /tmp/server-power-cycle-lock
if [ -f /root/server-power-cycle/state.off ]; then
logger -t $tag 'Server state set to off'
while true; do
sleep 60 # sleep 1 minute
logger -t $tag 'Trying to connect to idrac system...'
curl --connect-timeout 5 -s -k -u root:calvin -H"Content-type: application/json" -X GET https://192.168.1.4/redfish/v1/Chassis/System.Embedded.1/Power/PowerSupplies/PSU.Slot.2
if [[ $? -eq 0 ]]; then
logger -t $tag "Connected to idrac, assuming power is back on"
logger -t $tag "Power supply restored, sending power on command"
curl -s -k -u root:calvin -X POST -d '{"Action": "Reset", "ResetType": "On"}' -H"Content-type: application/json" https://192.168.1.4/redfish/v1/Systems/System.Embedded.1/Actions/ComputerSystem.Reset
rm /root/server-power-cycle/state.off
logger -t $tag end $(date '+%F-%R')
rm /tmp/server-power-cycle-lock
exit 0
fi
done
fi
voltage=$(curl -s -k -u root:calvin -H"Content-type: application/json" -X GET https://192.168.1.4/redfish/v1/Chassis/System.Embedded.1/Power/PowerSupplies/PSU.Slot.2 |jq .LineInputVoltage)
# check input voltage on the pwoer supply connected to the outer system
if [[ $voltage -gt 0 ]]; then
logger -t $tag "power supply is on. exiting"
logger -t $tag end $(date '+%F-%R')
rm /tmp/server-power-cycle-lock
exit 0
fi
to_wait=30
echo "Continuously checking power supply for the next $to_wait minutes"
for i in $(seq 30); do
logger -t $tag "Sleeping a minute..Minute $i"
sleep 60
# check input voltage on the pwoer supply connected to the outer system
voltage=$(curl -s -k -u root:calvin -H"Content-type: application/json" -X GET https://192.168.1.4/redfish/v1/Chassis/System.Embedded.1/Power/PowerSupplies/PSU.Slot.2 |jq .LineInputVoltage)
if [[ $voltage -gt 0 ]]; then
logger -t $tag "power supply is on. exiting"
logger -t $tag end $(date '+%F-%R')
rm /tmp/server-power-cycle-lock
exit 0
fi
done
logger -t $tag "Power supply did not come back, sending graceful shutdown signal"
curl -s -k -u root:calvin -X POST -d '{"Action": "Reset", "ResetType": "GracefulShutdown"}' -H"Content-type: application/json" https://192.168.1.4/redfish/v1/Systems/System.Embedded.1/Actions/ComputerSystem.Reset
touch /root/server-power-cycle/state.off
rm /tmp/server-power-cycle-lock
logger -t $tag end $(date '+%F-%R')