Reducing consumption and improving sustainability: Part 4, Turn it off … and on again!

By Mat Brown, Technical Lead, Data Center and Sustainability
June 28, 2024
7:00 am
Terms Of Use

How to use Prism Central and NCM features to plan consumption and reduce demand.

Scale your clusters up and down if you don’t need all the nodes on all the time to potentially save on energy costs and carbon emissions.

Yes! This is something we can do! Since the conceptual birth of the IT Sys Admin, the long kept secret to making the broken things work has been to “power cycle” them. “Have you turned it off and on again?” has been the solution to an unfeasibly vast number of IT support cases. But did you know that the same approach can be used to save energy and by extension reduce environmental impact and reduce costs? The secret is that once you’ve turned a device off, just leave it a bit (a lot) longer before turning the device on again. Preferably days, hours, or even months. Read on and we’ll describe how you can determine whether this is a viable cost and energy saving opportunity for your organization.

In part 2 of this blog series we discussed, among other things, incremental scaling up of infrastructure as a more long term strategy and showed how only “taking what you need” as you need it can have a marked effect on consumption. But for this blog post we’re going to look at something a bit more tactical, i.e. in the short and medium term.

Through its Lifecycle Manager (LCM) and Foundation features Nutanix has massively simplified the process of adding capacity and keeping software and firmware up to date. Adding capacity is a “1 click process” and now multi node removal is just as easy (actually, it’s ridiculously easy) so managing node capacity to reflect changes in demand can be very straightforward.

Categories are part of standard Prism Central (NCM not required) but can also be used as the basis for more easily taking advantage of many of the features of NCM including reporting.

Is the juice worth the squeeze?

The question is, can the effort required to complete these scaling operations be worth the reduction in energy consumption achieved?

In this context, when we say “effort” that could include:

Manual or physical effort to implement changes and the associated documentation
Electrical energy consumed from moving workloads (between nodes, racks, locations) or in restarting individual nodes / servers
Activity to measure the impact and mitigate the risks involved with the changes

So the rather unexciting, yet sensible answer will always be “it depends”. But more positively, there are some general scenarios where you’re more likely to see success:

Where you can leave things off for a longer period of time. (i.e. weeks or months)
Where you can repeat the energy saving activity more regularly (i.e. so it can be automated to reduce effort).

For both of these, where the changes can be operationalised then it should be possible to conduct them more efficiently and get a better return on your investment of effort. A highly automated and scalable software defined environment should cut down on the required effort and increase the likelihood of success. But to help know what success might look like, and how to build a business case for it, here are some theoretical examples. We’re using EUC (or VDI) environments as examples, because they tend to have a more recognisable performance / utilization profile, but this could apply to other kinds of environments too.

Examples

If we look at a couple of scenarios you can see why it’s important to give this some thought to the required effort and how you might build the business case for scaling your environment down and up to save energy.

Example A: Scaling down for the weekend

Let’s first look at a purely illustrative short term example of a 16 node VDI cluster on premise in the UK.

NB:- By illustrative we mean that the power draw figures in watts are arbitrary but representative for the purposes of this blog (in the opinion of the author). If you plan on using this calculation in any way, it’s worth using one of the methods described in this document to grab some measurements of your own. Additionally, the other sources used to calculate carbon emissions are linked, but may differ from those used by your organization. It may be worth contacting your organization’s sustainability team to see what sources they use for carbon intensity grid factors or other sustainability metrics.

Firstly, let’s assume that during business hours, when users are all logged in and working the nodes use about 700W each (700W is an informed example figure).

At the weekend, when 95% of users aren’t working, the cluster is almost completely idle and the nodes run at 200W each (again, an informed example figure).

A cunning sys admin might suggest that energy savings can be made by turning off some of the nodes at the weekend. So assuming they’ve done all their due diligence around the weekend capacity, change control etc., let’s see how that business case might work out:

Weekend time: 17:00 Friday – 08:00 Monday = 63 hours

Assuming 1 hour node removal, 1 hour node addition and 1 hour contingency let’s call it 60 hours.

200W x 4 nodes is 800W. 800W x 60 hours = 48kWh of energy

Next we should include the PUE (Power Usage Effectiveness) of the facility where the environment is being run. A reasonable average PUE is 1.8 and although there are potentially some nuances to this calculation, especially if you host the environment in your own facility, that makes the savings 86.4kWh.

So far so interesting, but what does 86.4kWh mean to you?

That is about 1.5x the energy used by a typical UK home for a week.

The benefits can be both financial and environmental i.e., carbon emissions.

First let’s look at the financial benefit as it’s pretty simple to calculate. The current average UK price of energy in kWh is around 31.3p so 86.4 x 31.3p is about £26.5 savings for every weekend or £1406.25 per year if you scale down every weekend, maybe slightly more if you can do some public holidays and holiday low periods too.

Of course this calculation heavily depends on the price of electricity where you are and what you are actually paying. Some data center co-locations have minimum commit levels so this scaling down approach may not make any actual difference to the price being paid. So let’s take a look at the environmental impact.

In terms of carbon emissions, the UK has a very variable grid due to the high proportion of renewables and with weekends often being periods with a better mix of low carbon generation due to heavy industry generally being offline, but lets for the moment use the UK average for 2022 of 0.19338 KGCO2e/kWh. We can then calculate 86.4kWh x 0.19338 = 16.708032 KGCO2e

Therefore, in a year the benefit of implementing this “weekend scaledown” approach could be of the order of 16.7 * 52 = 885.5KG of CO2 emissions. That’s about the same CO2 absorbed by five fully mature trees in a year.

What this all means to you and your business depends on whether these benefits represent a worthwhile endeavor and what effort is required to implement them. We hope that Nutanix’s technology derisks and simplifies implementing sufficiently to give you a better chance of success vs competing technologies. But to put it into context, if your IT budget is £10 million per year then that effort on its own is going to save about 0.015% of the total IT budget.

A marginal budgetary saving at best and that’s if you pay for energy use. Some co-location or hosting provider contracts mean that, although you’ll be helping reduce carbon emissions from energy generation, you might not actually save your IT budget any money, so always check how the energy tariff and billing works before you build your business case!

Additionally, shutting down and booting up nodes and/or VMs comes with a temporary increase in utilization that may nullify your potential reductions if the resulting scaledown period is not long enough. Taking measurements ahead of any changes and doing some basic estimations should give you an idea of if these activities will produce the desired results. But, of course, we are using example numbers here so things could be more clear cut in your environment and ultimately the only true measure comes after implementation.

Example B: Scaling up for “Autumn Pressure”

So let’s take a look at what could happen with a retail company who does most of their business during the holiday-induced retail rush from November through January. For simplicity’s sake, let’s look at the same 16 node cluster being scaled down to 12 for February-October which is ¾ of the year or about 274 days or 6576 hours.

200W x 4 x 6576 x 1.8 PUE = 9469.44kWh that’s enough to power a typical UK home for over 3 years or £2964 in savings. Furthermore it’s about 1831KG of CO2e or nearly the CO2 absorbed by 11 mature trees in a year.

Comparison of examples

Let’s look at the potential yearly savings for both of these illustrative examples:

Example A: 4,399.2kWh, £1406.25, 885.5 KGCO2, 5 Mature Trees, 104 operations per year

Example B 9469.44kWh, £2964, 1831 KGCO2e, 11 Mature Trees, 2 operations per year

In an effort to help better understand the level of effort required, we’ve included the number of scaling operations per year. This does not fully take into account the effort involved in planning, implementing, documenting and repeating the changes necessary. But, it can give an indication. What’s clear is that Example A is only going to be worthwhile if it can be near enough fully automated to minimize the effort required. If it can be operationalised so that there is minimal overhead / risk to the service, then over the years it is running it could be worth the “squeeze” and the fact that it runs every week could mean that it’s much easier to make it part of regular operations.

Using automation tools, like those provided by the X-Play automation tool, and Capacity Runway and other features of Nutanix Cloud Platform (NCP) means it can be much easier to operationalise the scaling process. If it can be fully automated, then it could be quite possible to successfully implement this strategy on a weekly basis and see a worthwhile return on the investment of effort.

For example B, it might be that the months of not using some of the nodes mean that it’s more efficient for your organization to manually scale the clusters than spend time fully automating how it’s done. A bi-yearly occurrence wouldn’t usually warrant the investment of time in automating and scheduling the scaling. Furthermore, if the scaling automation isn’t happening regularly then it could be that IT teams forget it’s happening and it then comes as a surprise when nodes start ejecting from clusters!

Autoscaling in Public Cloud with NC2

One potential very simple way to scale your infrastructure is by using the Nutanix Cloud Clusters (NC2) solution. For Nutanix clusters running in a public cloud, the NC2 console makes adding and removing nodes very straightforward, but with some simple X-Play automation that cluster can be configured to autoscale according to its utilization. This video shows how to both scale a cluster up as it becomes more heavily utilized and scale it back down once utilization goes below a certain threshold. Once configured, the scaling process requires manual intervention and can happen in as little as 30 mins. Furthermore, the X-Play automation can be easily configured to send emails or enter ServiceNow tickets to track the activity according to your organization’s change governance policy. With some adaptation, a similar approach can be used in data center deployments providing a great way to scale your infrastructure as demand requires it.

Compute Only Nodes

Be mindful that ejecting nodes that host a lot of data can be a time and energy consuming process as the data is reprotected across remaining nodes. To mitigate this, consider the use of compute only (CO) nodes. CO nodes could be a good option because, when they are removed from the cluster, there is no data to be copied for reproduction purposes so the ejection process should be a lot quicker (just waiting for VMs to migrate to the remaining nodes). To help you decide if CO nodes are worth considering for you, here are some useful links to further information on CO nodes:

What is a “compute only node”? https://next.nutanix.com/how-it-works-22/what-are-the-nutanix-compute-nodes-37488

Portal documentation: https://portal.nutanix.com/page/documents/details?targetId=Web-Console-Guide-Prism-v6_7:wc-compute-only-node-c.html

Multi node expansion:

You can also talk to your Nutanix account team or resale partner technical team to see if compute only nodes could be a good choice for your organization.

Addressing Risk

IT environments constantly evolve and while even a minor change can introduce risk, not changing can bring its own problems. Unpatched systems face cyber threats and regulatory non-compliance. Outdated software hinders the ability to use modern technologies that can hinder competitiveness. Overprovisioning of resources, unrestricted consumption or failing to optimize can also add an unnecessary financial burden to organizations.

The key is balancing risk with progress. IT teams must increasingly consider environmental and sustainability metrics in their decision making. Mastering this balance is crucial for navigating the changing IT landscape.

Conclusion

As previously mentioned, every organization is different and to understand if there’s a viable business case to implement down / up cluster scaling (or any other kind of energy reduction measures) you will need to consider the specifics of your situation.

But if you can get the basics together you might be able to see if there’s a case for action. By way of a checklist, here’s what you’ll need to get together.

Where you are in the world your environment is located so as to determine:
- The cost of electricity, if indeed you are paying. If you can’t find a bill then research an average for your region.
- The carbon intensity of your electricity grid from a credible source like Electricity Maps or whatever your organization uses as a source for carbon intensity grid factors.
The idle power of nodes at the low period (e.g. weekend or at quiet season). Use this Technote for information on how to do that.
The PUE of your data center if you know it, or a reasonably sourced average
An assessment of your cluster’s capacity (use Capacity Runway!) to make sure that it won’t miss the nodes you want to turn off, either for compute or storage.
How many scaling operations per year do you anticipate? How many hours of node downtime will this result in?
Take some time to investigate the Nutanix Carbon and Power Estimator, an educational tool that demonstrates how some of the key concepts used above can impact energy consumption and the associated carbon emissions.

Collating this information should be reasonably straightforward, at least at a high level, then as the business case is developed further detail can be obtained and the full risk / reward of implementing this strategy can be assessed.

Our hope is that, if facilitated by smart technology, properly designed processes and backed by a world class support organization, organizations can start to think seriously about how they can scale their infrastructure on demand and reduce the overall energy consumption.

efficiency, ncp, ntnxsust, series, strategy

© 2024 Nutanix, Inc. All rights reserved. Nutanix, the Nutanix logo and all Nutanix product, feature and service names mentioned herein are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. Other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s). This post may contain links to external websites that are not part of Nutanix.com. Nutanix does not control these sites and disclaims all responsibility for the content or accuracy of any external site. Our decision to link to an external site should not be considered an endorsement of any content on such a site. Certain information contained in this post may relate to or be based on studies, publications, surveys and other data obtained from third-party sources and our own internal estimates and research. While we believe these third-party studies, publications, surveys and other data are reliable as of the date of this post, they have not independently verified, and we make no representation as to the adequacy, fairness, accuracy, or completeness of any information obtained from third-party sources.

Nutanix.dev