I’ve continued contributing computing resources non-stop to science researchers since my March post. A byproduct is learning how my home lab operates at full throttle and the energy implications. My last blog discussed some of my original sustainability learnings.
I drove CPU usage to approximately 95% when I started to donate all of my excess compute capacity. Shortly after operating at full throttle, an alert popped up in VMware vRealize Operations Manager 8.0 console. This alert provided proactive performance improvement recommendation – and an idea for this blog post.
I learned that the most energy efficient setting for my home lab servers was to turn off all of the processor energy savings features. This lesson was counter intuitive. Once my home lab was operating at full utilization the servers wasted processing power and energy by attempting to turn on power saving features. The default server configuration assumed that the current task was a momentary spike in demand. Once the sprint was over, the processor would start shutting down excess capacity. Due to the high utilization, another spike in demand quickly arrived and the processor would switch to maximum capacity. Now the processor would need to ramp up. This incorrect assumption led to a reduction in processing capacity and slowed the scientific research workload. The energy consumption didn’t decrease but the amount of work completed was reduced.
Who Should Sleep?
Sleep states and hibernation for bears and computers are necessary to save energy stores when nothing is happening. Both species go through a “waking-up” state which takes time and energy. Our Pacific Northwest bears benefit from powering off unnecessary functions in the winter but a server processor at full capacity does not. This only slows down the workload while wasting energy which isn’t a sustainable solution.
Turning Off Power Saving Features
The 3 SuperMicro SuperServer E300-8D’s in my home lab have rudimentary power management features. The p-state and c-state features allow processors to shutdown excess capacity. This feature is similar to an energy efficient pickup truck engine which turns off pistons that aren’t needed at highway cruising speed. Following are the default AMI BIOS p-state and c-state settings for these servers. I have disabled both settings that are highlighted.
The alerts stopped once I configured the servers for compute intensive workloads running non-stop. Enterprise servers are complex and default settings reduce the time and understanding required to stand-up infrastructure. VMware vRealize Operations Manager highlighted this mis-configuration which I wouldn’t have found otherwise. This is one example of many where this tool has pointed out hidden problems and taught me something new. I never expected that turning off all power management features is the most sustainable option.
Deploying the VMware [email protected] fling to join the worlds largest distributed supercomputer is a worthwhile and interesting pursuit. Scientific research will require a monumental number of person-years over a long period of time to develop treatments and a vaccine. Standing up the [email protected] software is only the first step. It will take a marathon to win this race.
My previous blog post described how to contribute home lab resources with a negligible impact on performance and responsiveness. This is only the first obstacle to overcome. When I was training for the marathon I “hit the wall” during a 20 mile training run. I lost any motivation to move another step once I depleted all of my energy stores. I learned from this experience and accepted every GU energy supplement offered during the race to finish the Seattle Marathon. Contributing computer resources to researchers isn’t sustainable if your electricity bill doubles. The fear of a large energy bill is also an example of “hitting the wall”. Beyond the personal financial impact, natural resources are inefficiently used if someone else can provide IT resources more efficiently.
In 2003 I attended an event in Pasadena, CA where the late Peter Drucker spoke. Mr. Drucker has been described as “the founder of modern management”.
I learned during his speech how important measurement is to achieve organizational goals. I took his lesson and started measuring to understand whether donating computing resources was a sustainable activity for me. Next I needed to decide what to measure.
Measurement: Electricity usage
All servers, NAS, and networking infrastructure are plugged into a CyberPower CST135XLU UPS I bought at Costco. The UPS measures the electricity used by all of the equipment in the half-rack, not only the servers.
This UPS supports CyberPower’s PowerPanel Business VMware virtual appliance. It provides detailed reporting in addition to a graceful shutdown capability during a power outage to protect my vSAN datastore.
PowerPanel Business logs energy load percentage recorded every 10 minutes. Watts consumed is a calculation of the energy load percentage multiplied by total capacity of the UPS which is 810 watts. For example a reading of 35% energy load represents the use of 283.5 watts.
An Excel pivot table is used to analyze the home lab energy usage data imported from the CyberPower UPS PowerPanel CSV file. The pivot table made it easy to graph, average and total electricity usage per day.
VMware vSphere measures cumulative compute capacity of a cluster which is more tangible than percentage of CPU utilization. In my home lab I have 26.2 GHz of CPU capacity, which is derived as follows:
3 Supermicro SuperServer E300-8D servers each with an Intel Xeon D-1518 CPU
Each Intel Xeon D-1518 CPU has 4 cores running @ 2.20 GHz
Total cluster compute power 26.2 GHz = 3 servers * 4 cores/each * 2.20 GHz
Baseline energy use – prior to donating compute resources
A 25% CPU utilization baseline prior to donating resources was used from eyeballing the vSphere annual home lab CPU performance graph above. The baseline consumes 6.6 Ghz of compute, which is derived by taking 25% (CPU capacity) of 26.4 GHz total cluster capacity. CyperPower UPS PowerPanel software reported the electricity cost averaged $21.56 per month for 177 kilowatts during the baseline time period. Puget Sound Energy supplies electricity @ $0.122/kwh including all taxes.
Incremental energy use after donating spare capacity
A surplus of 19.8 Ghz of compute capacity is unused in the cluster, which is the 75% of capacity.
The sharp increase to 100% CPU utilization on the far right of the graph is from donating computer resources through [email protected] fling and [email protected] The entire home lab infrastructure including servers running 7 days a week, 24 hours a day consumes the majority of the energy even if it has a light load. The additional 19.8 GHz of compute work across all 3 servers barely increased electricity costs by $1.80 per 5 kilowatts.
The graph & table below illustrates how donating an incremental 19.8 Ghz of compute results in a disproportionately small increase in electricity usage. This seems counter intuitive prior to analyzing the data.
The baseline workload consumed the majority of the electricity usage prior to increasing utilization. This illustrates how underutilized data centers waste a majority of their capacity and energy. Utilizing all of the computing capacity is extremely efficient.
A “Muscle Car” Home Lab
Many purchase retired enterprise class servers on eBay to build a home lab. Used enterprise class servers are inexpensive to purchase compared to buying new. Computer enthusiasts enjoy these big iron servers with many blinking lights and loud whirling fans. That’s a lot like how car enthusiasts treasure a muscle car with a powerful engine. These servers have large power supplies with a maximum rating of 400-900 watts.
The power outlet for my home lab is a typical shared 20 amp residential circuit. Three enterprise class servers pulling 900 watts would require a 22.5 amp circuit @ 120 volts. This power demand would require new electrical wiring and specialized receptacles installed by an electrician. A much larger UPS would also be required. Enterprise servers generate a lot of heat and noise from the cooling fans.
One of my co-workers has an exhaust fan which draws the heat from his enterprise servers into a vented attic. Snow doesn’t accumulate on his roof above his home lab due to the heat generated.
I don’t expect enterprise class servers to double their electricity usage if the server is already continuously running. I anticipate that the same pattern would exist, where incremental compute resources for [email protected] would have a small energy footprint.
If donating compute time changes the home lab usage pattern it would consume much more energy and easily result in a doubling of an electric bill. Turning on a home lab only for testing, education and practice is a much different use pattern than running a home lab continuously.
A “Green” Home Lab
A goal for my home lab was running it continuously, 24 hours a day, and 7 days a week. Energy efficiency or “Green” became a goal for my home lab after performing an energy cost comparison. A used enterprise server with a low server purchase price could become the most expensive option after assessing the total cost including larger UPS, new high amperage circuits, cooling, and continuous electricity use over many years.
The SuperMicro SuperServer E300-8D’s in my home lab have laptop sized power supplies with a maximum rating of 84 watts. This power supply is approximately 10% to 20% of the capacity of an enterprise server power supply.
These power supplies are compliance with US Department of Energy efficiency level VI which was went into effect in 2016.
This standard requires at least 88% efficiency and the remainder is wasted as heat. Less heat will make it difficult to melt snow on your roof but results in a more sustainable home lab.
My entire home lab including all of the storage, networking hardware, 2 mini infrastructure servers, and 3 lab servers uses less power than 1 enterprise class server.
Don’t Stop Running
When I ran the Seattle Marathon, I noticed at mile 19 people around me stopped running and began walking up Capitol Hill from the flat ground along Lake Washington. I repeated saying “keep on running” to myself so I could finish the marathon and keep the momentum going.
Donating excess computer resources in my example is close to free. It inexpensively provides a great deal of value to researchers. Due to the low incremental cost of energy and money, I have the motivation to continue running this long marathon.
My previous blog post described donating home lab compute resources to cornavirus researchers. Will my home lab get bogged down and become painfully unresponsive? This is the first question I had after donating compute resources. Interest in doing good could quickly wane if it becomes difficult to get my work done.
The University of Washington (UW) Institute of Protein Design has a similar project called [email protected]. Even though I’m a different UW alumni (University of Wisconsin – not Washington) I’ve made Seattle my home over the last 12 years. I joined this project to help my neighboring researchers. It’s not as easy as deploying the VMware virtual appliance fling for [email protected]. First I manually created the vm, deployed Red Hat Enterprise Linux in each vm, updated the OS, and then installed the BOINC package. The BOINC package is available for many other OS’s.
What if I could prioritize my regular home lab work AND use excess capacity for [email protected] while I was waiting for the release of new [email protected] workloads? Could I retain my fast and responsive home lab and donate excess resources?
CPU’s are always executing instructions regardless if they have any work to do. Most of the time they have nothing to do. Instead of filling empty space with the idle process, Folding & Rosetta @home can execute instead of the CPU consuming empty calories.
vSphere’s Distributed Resource Scheduler (DRS) ensures that vms receives the right amount of resources based on the policy defined by the administrator. I reopened my course manual from the VMware Education “vSphere: Install, Configure, Manage plus Optimize and Scale Fast Track [V6.5]” class & exam I completed in 2018 to refresh my memory on the scheduling options available.
Resource Pools & Shares
The above screenshot shows the DRS resource pools defined to achieve my CPU scheduling goals. This example uses vSphere 7 which was released last week however this feature has been available for many years. I utilized shares to maximize my CPU utilization by ensuring that the 24 CPU cores in my home lab are always busy with work instead of executing an idle process which does nothing.
I defined a higher relative priorities for regular workloads and a lower priority for “Community Distributing Computing” workloads. The picture below illustrates how the “Community Distributed Computing Resource Pool” is configured with low shares.
My individual regular workload vms by default have normal shares, which is a higher relative priority than the low shares resource pool shown above. This results in a negligible impact to performance for my regular workloads. I haven’t noticed the extra load which is fully utilizing the last drops of processing capacity my CPU’s. Below is a cluster based CPU usage utilization graph from vRealize Operations 8.0. The 3 CPU’s had plenty of unused capacity while they were waiting for [email protected] work units. This is circled in blue prior to adding [email protected] to the cluster. Once I added [email protected] with the DRS shares policy all of the CPU cores in the cluster were fully utilized, this is the area circled in red.
Prioritize Multiple Community Distributing Projects
Large enterprise IT customers use these same features to fully utilize their data center resources. A common example is to place production and dev/test workloads on the same cluster, and provide production workloads a higher priority. Enterprise customers improve their data center return on investment since they don’t have underutilized computing resources. Public cloud providers use this same model to sell efficient compute services.
Happy Home Lab
The home lab is happy since it is contributing unused CPU processing power to the community without impacting performance of everything else. My next blog post will describe the sustainability of the solution and impact to my Puget Sound Energy electricity bill.
The global pandemic crisis has quickly mobilized a new volunteer community at technology companies and beyond. This community is providing a vast amount of valuable computing resources to leading biomedical academic researchers. One of the reasons why researchers need resources is to learn how the coronavirus works. This knowledge can help the development vaccines and medicine to fight it.
I’ve been fortunate to receive help from countless individuals who contributed to building my talents throughout my life. I can’t sew masks like my wife Michelle is doing to help our front line heroes, but I’m contributing my time and talents to donate computer resources and get the word out.
[email protected] is the largest volunteer project contributing unused processing power to biomedical researchers understanding human proteins. Technology for this project is similar to the popular [email protected] project to search for alien life. Both of these project use unused processing power from anyone who installed their software. Currently the [email protected] project is the largest supercomputer in the world.
Technology companies have vast amount of computing resources in their data centers and many of their employees have home labs. These home labs are micro data centers purchased by employees to learn and gain experience with enterprise information technology software. Servers in corporate or micro data centers are sized for maximum demand and often have unused capacity.
An ad-hoc team @ VMware came together to deploy [email protected] both in corporate data centers and employees home labs. This team quickly built and shipped a VMware virtual appliance fling to package and make it easy for anyone to deploy the software. Flings are “engineer pet projects” freely distributed by VMware to the public. Approval was received by Dr. Greg Bowman the Director of [email protected] for VMware to host and distribute the virtual appliance with their project. I learned about the fling through an internal Slack channel and quickly deployed it to my 3 servers on March 20th when it was released.
Future Technical Blog Posts
Negligible Impact: A future blog post will explain how distributed resource scheduler (DRS) enforces my policies to provide [email protected] only excess compute capacity while not degrading my preexisting workloads.
Sustainability: I’ll also describe the energy impact to my home lab by adding these compute intensive [email protected] workloads in a separate post. I’ve taken steps for my home lab to efficiently use electricity and make this project sustainable for me.
Computer and Personal Time: VMware’s customers and many in the technology industry from the IT channel through the largest technology companies like IBM, Microsoft, Dell, Google, Apple, and Amazon already have started a response. CRN recently published an article on how the channel partners are jumping in to support the cause. Consider contributing your excess computing capacity from your laptop or your server farm by joining the effort already underway at your company. If you the first in your organization, deploy the [email protected] VMware virtual appliance fling or the original software directly from [email protected]
Will I generate an audience? How long I will publish my blog? I decided to operate this blog as inexpensively as possible since I don’t know. The blog solution I cobbled together is almost free after the annual domain name cost.
blog architecture for bitofsnow.com
Longevity & Low Cost: Due to the unknown demand, low cost is a key goal. I migrated www.foxhill.org from a server in my basement to AWS S3 static website hosting 6 years ago. My total cost from AWS for www.foxhill.org has been about $1 for S3 over the entire 6 years. I don’t know about you, but I think that is essentially free. If the blog is low cost, there is little pain to leave the blog up in case of low demand or if I take a break from blogging.
Personalized Domain Name & Flexibility: I like choice in using any DNS service for my custom domain name. This provides flexibility for future uses I can’t imagine now. It is easier to use a vertical integrated blog SaaS solution but you may give up full control over your domain name. A blog SaaS has reoccurring charges for their service due to the work, support, and simplicity they provide to an average non-techie customer.
AWS Public Cloud: I have a VMware vSphere based home lab with 3 robust servers which could easily handle the load of a dynamic web server for my blog. I decided to host the blog in the cloud based on my earlier experience with foxhill.org.
On a server in my basement for 15 years I hosted foxhill.org’s web site and email. In 2014 I migrated this site to the cloud. I decided to get out of the hosting business in 2014 due to the following reasons:
Maintenance: Production servers require frequent software patching and keeping the software up to date. My email server would run out of space at inconvenient times. Once the power supply failed on the server resulting in days of downtime and expedited shipping costs. I didn’t like being a slave to managing the production servers with the regular demands of life and work commitments.
Security: Software patching and upgrades both are part of important security practices. With the increased sophistication of hackers this is only one of the responsibilities to keep your site and home LAN secure. Hosting all of your services in the public cloud solves many security challenges.
ISP SPAM monitoring: One day in 2010 my home broadband was down. I was surprised to learn that my ISP shut my service off due to an abnormally high amount of inbound SPAM detected. This was inconvenient and started the process to move my email domain to the cloud.
Cost: Cloud services can range in cost from free to a large significant re-occurring expense. I discovered low cost solutions to migrate these services to the cloud making this a viable solution.
WordPress: WordPress is the leading blogging Content Management System (CMS). It is the leading supported platform with thousands of themes and plenty of educational content. I quickly learned how to use WordPress in a few days this week since it’s intuitive and full featured
Ubuntu Linux: I decided to self-host WordPress for content development, management, and publishing. Since I already own a VMware vSphere based home lab, I quickly spun up a new Ubuntu Server 18.04 LTS virtual machine (VM) on it. I selected the Docker option during Ubuntu installation so I could deploy the multi-container WordPress package. The following blog provided instruction to install the WordPress containers. As an alternative, this solution may work with WordPress on Windows or Mac PC but I haven’t tried it.
AWS S3 Static Web Site Hosting: This is a simple and straight forward service which provides static websites. The cost of hosting a static web site on S3 is an order of magnitude less expensive than paying for a WordPress in a SaaS or IaaS model.
The first step is to configure S3 for hosting websites which is documented here. Copy all static website files generated from WP2Static to S3 allowing the public to read. Next configure S3 to use your index.html file created by WP2Static.
AWS Route 53 Domain Registration: I chose AWS to register my bitofsnow.com domain due to the low cost $12/year which includes domain privacy and lock. I noticed other providers are cheaper but domain privacy was an add-on making them more expensive. Once I requested my domain it took 18 minutes to go live and push the .com entry to Verisign. I was happy with the quick provisioning since AWS warned me it could take up to 3 days.
Cloudflare DNS: I’m using Cloudflare for my bitofsnow.com domain since it’s free, simple, fast, and secure. I have used Cloudflare 126.96.36.199 DNS resolver on my home router since they launched the service and have been pleased with it. An alternative is AWS Route 53, but it’s a paid service. Once the DNS for a domain is configured there isn’t any additional work. Cloudflare also offers DNS analytics for free which shows requests, traffic by country, and stats.
Google Analytics: Without a website analytics system it would be difficult to determine if the blog has an audience and what posts are popular. I selected Google Analytics since it’s a leading solution and free. AWS provides website analytics through their CloudFront CDN. I didn’t require a CDN which is an extra cost.
Zoho Email: Email wasn’t required for my blogging solution. However I’m taking advantage of the custom domain name I bought and use it for my personal email address. I didn’t find any free robust email solutions which support a custom domain name. I came across Zoho and was impressed in the value of their Mail Lite offering at $12 a year. It’s a modern email platform with a web mail experience similar to Gmail and Outlook.com.
I was able to find detailed instructions on the web how to configure each piece but not a complete solution to solve my needs. I hope this solution overview provides motivation for someone who’d like to get started blogging but has the same concerns I had.