Motivation to run a resource donation marathon

Deploying the VMware [email protected] fling to join the worlds largest distributed supercomputer is a worthwhile and interesting pursuit. Scientific research will require a monumental number of person-years over a long period of time to develop treatments and a vaccine. Standing up the [email protected] software is only the first step. It will take a marathon to win this race.

Author pictured after completing 26.2 miles of the 2011 Seattle Marathon

My previous blog post described how to contribute home lab resources with a negligible impact on performance and responsiveness. This is only the first obstacle to overcome. When I was training for the marathon I “hit the wall” during a 20 mile training run. I lost any motivation to move another step once I depleted all of my energy stores. I learned from this experience and accepted every GU energy supplement offered during the race to finish the Seattle Marathon. Contributing computer resources to researchers isn’t sustainable if your electricity bill doubles. The fear of a large energy bill is also an example of “hitting the wall”. Beyond the personal financial impact, natural resources are inefficiently used if someone else can provide IT resources more efficiently.

Measurement

In 2003 I attended an event in Pasadena, CA where the late Peter Drucker spoke. Mr. Drucker has been described as “the founder of modern management”.

Book Peter Drucker signed during the event

I learned during his speech how important measurement is to achieve organizational goals. I took his lesson and started measuring to understand whether donating computing resources was a sustainable activity for me. Next I needed to decide what to measure.

Measurement: Electricity usage

All servers, NAS, and networking infrastructure are plugged into a CyberPower CST135XLU UPS I bought at Costco. The UPS measures the electricity used by all of the equipment in the half-rack, not only the servers.

CyberPower CST135XLU UPS in lower right which powers my half rack home lab. The UPS distributes power through a CyberPower RKBS15S2F10R 15A 12-Outlet 1U RM Rackbar Surge Suppressor (in the middle of the rack with a two green LED’s on the left side)

This UPS supports CyberPower’s PowerPanel Business VMware virtual appliance. It provides detailed reporting in addition to a graceful shutdown capability during a power outage to protect my vSAN datastore.

PowerPanel Business logs energy load percentage recorded every 10 minutes. Watts consumed is a calculation of the energy load percentage multiplied by total capacity of the UPS which is 810 watts. For example a reading of 35% energy load represents the use of 283.5 watts.

Transition from baseline to deploying [email protected]

An Excel pivot table is used to analyze the home lab energy usage data imported from the CyberPower UPS PowerPanel CSV file. The pivot table made it easy to graph, average and total electricity usage per day.

Excel Graph created from UPS energy data summarized in pivot table.

The graph shows both the lower baseline energy usage and how the energy usage increased after I began donating computing time to [email protected] and [email protected] The dips shown after deploying [email protected] is due to the servers waiting for work units from protein researchers. After the work units are received the energy usage increases as the servers increase utilization. Finally, 100% CPU utilization results in increased energy usage after I deployed VMware Distributed Resource Scheduler using shares and adding [email protected].

Measurement: Cluster compute capacity

VMware vSphere measures cumulative compute capacity of a cluster which is more tangible than percentage of CPU utilization. In my home lab I have 26.2 GHz of CPU capacity, which is derived as follows:

  • 3 Supermicro SuperServer E300-8D servers each with an Intel Xeon D-1518 CPU
  • Each Intel Xeon D-1518 CPU has 4 cores running @ 2.20 GHz
  • Total cluster compute power 26.2 GHz = 3 servers * 4 cores/each * 2.20 GHz

Baseline energy use – prior to donating compute resources

VMware vSphere 7.0 CPU performance graph over previous year from an individual server.

A 25% CPU utilization baseline prior to donating resources was used from eyeballing the vSphere annual home lab CPU performance graph above. The baseline consumes 6.6 Ghz of compute, which is derived by taking 25% (CPU capacity) of 26.4 GHz total cluster capacity. CyperPower UPS PowerPanel software reported the electricity cost averaged $21.56 per month for 177 kilowatts during the baseline time period. Puget Sound Energy supplies electricity @ $0.122/kwh including all taxes.

Incremental energy use after donating spare capacity

A surplus of 19.8 Ghz of compute capacity is unused in the cluster, which is the 75% of capacity.

VMware vSphere 7.0 CPU performance graph over a year for an individual server. The rocketing CPU % increase in the far right is due to donating computer resources

The sharp increase to 100% CPU utilization on the far right of the graph is from donating computer resources through [email protected] fling and [email protected] The entire home lab infrastructure including servers running 7 days a week, 24 hours a day consumes the majority of the energy even if it has a light load. The additional 19.8 GHz of compute work across all 3 servers barely increased electricity costs by $1.80 per 5 kilowatts.  

The graph & table below illustrates how donating an incremental 19.8 Ghz of compute results in a disproportionately small increase in electricity usage. This seems counter intuitive prior to analyzing the data.

The baseline workload consumed the majority of the electricity usage prior to increasing utilization. This illustrates how underutilized data centers waste a majority of their capacity and energy. Utilizing all of the computing capacity is extremely efficient. 

Graph: Incremental energy use and cost for donating unused compute to [email protected] and [email protected]
Table: Incremental energy use and cost for donating unused compute to [email protected] and [email protected]

A “Muscle Car” Home Lab

Many purchase retired enterprise class servers on eBay to build a home lab. Used enterprise class servers are inexpensive to purchase compared to buying new. Computer enthusiasts enjoy these big iron servers with many blinking lights and loud whirling fans. That’s a lot like how car enthusiasts treasure a muscle car with a powerful engine. These servers have large power supplies with a maximum rating of 400-900 watts. 

The power outlet for my home lab is a typical shared 20 amp residential circuit. Three enterprise class servers pulling 900 watts would require a 22.5 amp circuit @ 120 volts. This power demand would require new electrical wiring and specialized receptacles installed by an electrician. A much larger UPS would also be required. Enterprise servers generate a lot of heat and noise from the cooling fans.

One of my co-workers has an exhaust fan which draws the heat from his enterprise servers into a vented attic. Snow doesn’t accumulate on his roof above his home lab due to the heat generated.

I don’t expect enterprise class servers to double their electricity usage if the server is already continuously running. I anticipate that the same pattern would exist, where incremental compute resources for [email protected] would have a small energy footprint.

If donating compute time changes the home lab usage pattern it would consume much more energy and easily result in a doubling of an electric bill. Turning on a home lab only for testing, education and practice is a much different use pattern than running a home lab continuously.

A “Green” Home Lab

A goal for my home lab was running it continuously, 24 hours a day, and 7 days a week. Energy efficiency or “Green” became a goal for my home lab after performing an energy cost comparison. A used enterprise server with a low server purchase price could become the most expensive option after assessing the total cost including larger UPS, new high amperage circuits, cooling, and continuous electricity use over many years. 

The SuperMicro SuperServer E300-8D’s in my home lab have laptop sized power supplies with a maximum rating of 84 watts. This power supply is approximately 10% to 20% of the capacity of an enterprise server power supply.

SuperMicro SuperServer E300-8D – FSP Group FSP084-DIBAN2 84 watt power supply

These power supplies are compliance with US Department of Energy efficiency level VI which was went into effect in 2016.

Power Supplies – Efficiency Level VI

This standard requires at least 88% efficiency and the remainder is wasted as heat. Less heat will make it difficult to melt snow on your roof but results in a more sustainable home lab. 

My entire home lab including all of the storage, networking hardware, 2 mini infrastructure servers, and 3 lab servers uses less power than 1 enterprise class server.

Don’t Stop Running

When I ran the Seattle Marathon, I noticed at mile 19 people around me stopped running and began walking up Capitol Hill from the flat ground along Lake Washington. I repeated saying “keep on running” to myself so I could finish the marathon and keep the momentum going. 

Donating excess computer resources in my example is close to free. It inexpensively provides a great deal of value to researchers. Due to the low incremental cost of energy and money, I have the motivation to continue running this long marathon. 

How to have a happy home lab while donating compute resources to coronavirus researchers

My previous blog post described donating home lab compute resources to cornavirus researchers. Will my home lab get bogged down and become painfully unresponsive? This is the first question I had after donating compute resources. Interest in doing good could quickly wane if it becomes difficult to get my work done.

The rapid growth of [email protected] resulted in temporary shortages of work units for computers enlisted in the project. A [email protected] work unit is a unit of protein data which requires analysis by a computer.

example of [email protected] virtual appliance fling busy computing work units. It’s more gratify to see work getting done instead of waiting for work units on an idle server.

While waiting, I “discovered” [email protected]

The University of Washington (UW) Institute of Protein Design has a similar project called [email protected]. Even though I’m a different UW alumni (University of Wisconsin – not Washington) I’ve made Seattle my home over the last 12 years. I joined this project to help my neighboring researchers. It’s not as easy as deploying the VMware virtual appliance fling for [email protected]. First I manually created the vm, deployed Red Hat Enterprise Linux in each vm, updated the OS, and then installed the BOINC package. The BOINC package is available for many other OS’s.

[email protected] in action

What if

What if I could prioritize my regular home lab work AND use excess capacity for [email protected] while I was waiting for the release of new [email protected] workloads? Could I retain my fast and responsive home lab and donate excess resources?

CPU’s are always executing instructions regardless if they have any work to do. Most of the time they have nothing to do. Instead of filling empty space with the idle process, Folding & Rosetta @home can execute instead of the CPU consuming empty calories.

Scheduling

vSphere’s Distributed Resource Scheduler (DRS) ensures that vms receives the right amount of resources based on the policy defined by the administrator. I reopened my course manual from the VMware Education “vSphere: Install, Configure, Manage plus Optimize and Scale Fast Track [V6.5]” class & exam I completed in 2018 to refresh my memory on the scheduling options available.

Resource Pools & Shares

The above screenshot shows the DRS resource pools defined to achieve my CPU scheduling goals. This example uses vSphere 7 which was released last week however this feature has been available for many years. I utilized shares to maximize my CPU utilization by ensuring that the 24 CPU cores in my home lab are always busy with work instead of executing an idle process which does nothing.

I defined a higher relative priorities for regular workloads and a lower priority for “Community Distributing Computing” workloads. The picture below illustrates how the “Community Distributed Computing Resource Pool” is configured with low shares.

My individual regular workload vms by default have normal shares, which is a higher relative priority than the low shares resource pool shown above. This results in a negligible impact to performance for my regular workloads. I haven’t noticed the extra load which is fully utilizing the last drops of processing capacity my CPU’s. Below is a cluster based CPU usage utilization graph from vRealize Operations 8.0. The 3 CPU’s had plenty of unused capacity while they were waiting for [email protected] work units. This is circled in blue prior to adding [email protected] to the cluster. Once I added [email protected] with the DRS shares policy all of the CPU cores in the cluster were fully utilized, this is the area circled in red.

vRealize Operations 8.0 Cluster CPU Utilization

Prioritize Multiple Community Distributing Projects

I also utilized shares to prioritize the remaining CPU resources between [email protected] and [email protected] Shown below is a high relative priority shares resource pool for [email protected] and a low relative priority shares resource pool for [email protected] This example starves [email protected] for CPU resources when [email protected] is active with work units. If [email protected] is waiting for resources, [email protected] will claim all of the unused CPU resources. These relative priorities aren’t impacting my regular workloads.

Enterprise IT & Public Cloud Functionality

Large enterprise IT customers use these same features to fully utilize their data center resources. A common example is to place production and dev/test workloads on the same cluster, and provide production workloads a higher priority. Enterprise customers improve their data center return on investment since they don’t have underutilized computing resources. Public cloud providers use this same model to sell efficient compute services.

Happy Home Lab

The home lab is happy since it is contributing unused CPU processing power to the community without impacting performance of everything else. My next blog post will describe the sustainability of the solution and impact to my Puget Sound Energy electricity bill.

Providing valuable computing resources to coronavirus researchers by giving time

The global pandemic crisis has quickly mobilized a new volunteer community at technology companies and beyond. This community is providing a vast amount of valuable computing resources to leading biomedical academic researchers. One of the reasons why researchers need resources is to learn how the coronavirus works. This knowledge can help the development vaccines and medicine to fight it.

I’ve been fortunate to receive help from countless individuals who contributed to building my talents throughout my life. I can’t sew masks like my wife Michelle is doing to help our front line heroes, but I’m contributing my time and talents to donate computer resources and get the word out.


Michelle Sundquist’s mask production for her niece and co-workers who are nursing assistants in Idaho

[email protected] is the largest volunteer project contributing unused processing power to biomedical researchers understanding human proteins. Technology for this project is similar to the popular [email protected] project to search for alien life. Both of these project use unused processing power from anyone who installed their software. Currently the [email protected] project is the largest supercomputer in the world.

Giving time

Technology companies have vast amount of computing resources in their data centers and many of their employees have home labs. These home labs are micro data centers purchased by employees to learn and gain experience with enterprise information technology software. Servers in corporate or micro data centers are sized for maximum demand and often have unused capacity.

author’s home lab pictured as it is working on coronavirus research. Lab consists of 3 Supermicro SuperServer E300-8D servers running the VMware SDDC platform

[email protected] VMware fling

An ad-hoc team @ VMware came together to deploy [email protected] both in corporate data centers and employees home labs. This team quickly built and shipped a VMware virtual appliance fling to package and make it easy for anyone to deploy the software. Flings are “engineer pet projects” freely distributed by VMware to the public. Approval was received by Dr. Greg Bowman the Director of [email protected] for VMware to host and distribute the virtual appliance with their project. I learned about the fling through an internal Slack channel and quickly deployed it to my 3 servers on March 20th when it was released.

current progress to date

Future Technical Blog Posts

Negligible Impact: A future blog post will explain how distributed resource scheduler (DRS) enforces my policies to provide [email protected] only excess compute capacity while not degrading my preexisting workloads.

Sustainability: I’ll also describe the energy impact to my home lab by adding these compute intensive [email protected] workloads in a separate post. I’ve taken steps for my home lab to efficiently use electricity and make this project sustainable for me.

How you can help

Non-profit Grant from your employer: VMware like many other companies provide a service learning program benefit to their employees. A grant to the employees non-profit of choice is given for the hours spent volunteering in the community. I’m planning to utilize VMware’s program for my volunteer work on [email protected] One of the options I’m considering to direct the service learning grant is the [email protected] team at Washington University School of Medicine in St. Louis.

Computer and Personal Time: VMware’s customers and many in the technology industry from the IT channel through the largest technology companies like IBM, Microsoft, Dell, Google, Apple, and Amazon already have started a response. CRN recently published an article on how the channel partners are jumping in to support the cause. Consider contributing your excess computing capacity from your laptop or your server farm by joining the effort already underway at your company. If you the first in your organization, deploy the [email protected] VMware virtual appliance fling or the original software directly from [email protected]

team VMware [email protected] statistics

In the beginning – my low cost blogging solution

Will I generate an audience? How long I will publish my blog? I decided to operate this blog as inexpensively as possible since I don’t know. The blog solution I cobbled together is almost free after the annual domain name cost.

blog architecture for bitofsnow.com

Goals

Longevity & Low Cost: Due to the unknown demand, low cost is a key goal. I migrated www.foxhill.org from a server in my basement to AWS S3 static website hosting 6 years ago. My total cost from AWS for www.foxhill.org has been about $1 for S3 over the entire 6 years. I don’t know about you, but I think that is essentially free. If the blog is low cost, there is little pain to leave the blog up in case of low demand or if I take a break from blogging.

Personalized Domain Name & Flexibility: I like choice in using any DNS service for my custom domain name. This provides flexibility for future uses I can’t imagine now. It is easier to use a vertical integrated blog SaaS solution but you may give up full control over your domain name. A blog SaaS has reoccurring charges for their service due to the work, support, and simplicity they provide to an average non-techie customer.

Components

AWS Public Cloud: I have a VMware vSphere based home lab with 3 robust servers which could easily handle the load of a dynamic web server for my blog. I decided to host the blog in the cloud based on my earlier experience with foxhill.org.

On a server in my basement for 15 years I hosted foxhill.org’s web site and email. In 2014 I migrated this site to the cloud. I decided to get out of the hosting business in 2014 due to the following reasons:

  • Maintenance: Production servers require frequent software patching and keeping the software up to date. My email server would run out of space at inconvenient times. Once the power supply failed on the server resulting in days of downtime and expedited shipping costs. I didn’t like being a slave to managing the production servers with the regular demands of life and work commitments.
  • Security: Software patching and upgrades both are part of important security practices. With the increased sophistication of hackers this is only one of the responsibilities to keep your site and home LAN secure. Hosting all of your services in the public cloud solves many security challenges.
  • ISP SPAM monitoring: One day in 2010 my home broadband was down. I was surprised to learn that my ISP shut my service off due to an abnormally high amount of inbound SPAM detected. This was inconvenient and started the process to move my email domain to the cloud.
  • Cost: Cloud services can range in cost from free to a large significant re-occurring expense. I discovered low cost solutions to migrate these services to the cloud making this a viable solution.

WordPress: WordPress is the leading blogging Content Management System (CMS). It is the leading supported platform with thousands of themes and plenty of educational content. I quickly learned how to use WordPress in a few days this week since it’s intuitive and full featured

The magic sauce – WP2Static: WordPress dynamically renders web pages from content stored in a database. The WP2Static plugin crawls through a WordPress site and creates all of the files to standup a static website. This means that there is no code on the web server and the static website consists only of files: html, images, CSS, and JavaScript. The static website can be deployed to any web server instead of requiring a WordPress server. Native features are unavailable once WordPress isn’t hosting the website. Examples of native features lost are comments, search, and most plugins. These trade offs are acceptable to me to achieve my goals. The benefits of a static website is speed, higher security since there is no server side code, and deployment flexibility. WP2Static has an automatic website deployment feature, which has been a time saver to deploy my blog to the AWS S3 static website.

One click publishing from WordPress to S3 with WP2Static

Ubuntu Linux: I decided to self-host WordPress for content development, management, and publishing. Since I already own a VMware vSphere based home lab, I quickly spun up a new Ubuntu Server 18.04 LTS virtual machine (VM) on it. I selected the Docker option during Ubuntu installation so I could deploy the multi-container WordPress package. The following blog provided instruction to install the WordPress containers. As an alternative, this solution may work with WordPress on Windows or Mac PC but I haven’t tried it.

AWS S3 Static Web Site Hosting: This is a simple and straight forward service which provides static websites. The cost of hosting a static web site on S3 is an order of magnitude less expensive than paying for a WordPress in a SaaS or IaaS model.

The first step is to configure S3 for hosting websites which is documented here. Copy all static website files generated from WP2Static to S3 allowing the public to read. Next configure S3 to use your index.html file created by WP2Static.

AWS Route 53 Domain Registration: I chose AWS to register my bitofsnow.com domain due to the low cost $12/year which includes domain privacy and lock. I noticed other providers are cheaper but domain privacy was an add-on making them more expensive. Once I requested my domain it took 18 minutes to go live and push the .com entry to Verisign. I was happy with the quick provisioning since AWS warned me it could take up to 3 days.

Cloudflare DNS: I’m using Cloudflare for my bitofsnow.com domain since it’s free, simple, fast, and secure. I have used Cloudflare 1.1.1.1 DNS resolver on my home router since they launched the service and have been pleased with it. An alternative is AWS Route 53, but it’s a paid service. Once the DNS for a domain is configured there isn’t any additional work. Cloudflare also offers DNS analytics for free which shows requests, traffic by country, and stats.

Google Analytics: Without a website analytics system it would be difficult to determine if the blog has an audience and what posts are popular. I selected Google Analytics since it’s a leading solution and free. AWS provides website analytics through their CloudFront CDN. I didn’t require a CDN which is an extra cost.

It was easier than I expected to setup Google Analytics. I copied the javascript code snippet provided when I setup my account and pasted into the WordPress footer.php. The code snippet is inserted prior to the final /body. WP2Static automatically creates the Google Analytics javascript in the static web pages it creates.

Zoho Email: Email wasn’t required for my blogging solution. However I’m taking advantage of the custom domain name I bought and use it for my personal email address. I didn’t find any free robust email solutions which support a custom domain name. I came across Zoho and was impressed in the value of their Mail Lite offering at $12 a year. It’s a modern email platform with a web mail experience similar to Gmail and Outlook.com.

Conclusion

I was able to find detailed instructions on the web how to configure each piece but not a complete solution to solve my needs. I hope this solution overview provides motivation for someone who’d like to get started blogging but has the same concerns I had.