Don’t let the Bear Hibernate

Black bear increasing energy stores in the authors backyard for hibernation last winter

I’ve continued contributing computing resources non-stop to science researchers since my March post. A byproduct is learning how my home lab operates at full throttle and the energy implications. My last blog discussed some of my original sustainability learnings.

Impact to host @ 95% throttle

VMware vRealize Operations Manager providing additional compute capacity

I drove CPU usage to approximately 95% when I started to donate all of my excess compute capacity. Shortly after operating at full throttle, an alert popped up in VMware vRealize Operations Manager 8.0 console. This alert provided proactive performance improvement recommendation – and an idea for this blog post.

VMware vRealize Operations Manager 8.0 alert. VMware knowledge base article explaining how to fix the issue shown above.

I learned that the most energy efficient setting for my home lab servers was to turn off all of the processor energy savings features. This lesson was counter intuitive. Once my home lab was operating at full utilization the servers wasted processing power and energy by attempting to turn on power saving features. The default server configuration assumed that the current task was a momentary spike in demand. Once the sprint was over, the processor would start shutting down excess capacity. Due to the high utilization, another spike in demand quickly arrived and the processor would switch to maximum capacity. Now the processor would need to ramp up. This incorrect assumption led to a reduction in processing capacity and slowed the scientific research workload. The energy consumption didn’t decrease but the amount of work completed was reduced.

Who Should Sleep?

Sleep states and hibernation for bears and computers are necessary to save energy stores when nothing is happening. Both species go through a “waking-up” state which takes time and energy. Our Pacific Northwest bears benefit from powering off unnecessary functions in the winter but a server processor at full capacity does not. This only slows down the workload while wasting energy which isn’t a sustainable solution.

Turning Off Power Saving Features

The 3 SuperMicro SuperServer E300-8D’s in my home lab have rudimentary power management features. The p-state and c-state features allow processors to shutdown excess capacity. This feature is similar to an energy efficient pickup truck engine which turns off pistons that aren’t needed at highway cruising speed. Following are the default AMI BIOS p-state and c-state settings for these servers. I have disabled both settings that are highlighted.

Sustainable Configuration

The alerts stopped once I configured the servers for compute intensive workloads running non-stop. Enterprise servers are complex and default settings reduce the time and understanding required to stand-up infrastructure. VMware vRealize Operations Manager highlighted this mis-configuration which I wouldn’t have found otherwise. This is one example of many where this tool has pointed out hidden problems and taught me something new. I never expected that turning off all power management features is the most sustainable option.