Rick Koopman – EMEA HPC and AI Technical Leader – Lenovo Data Centre Group, discusses how the company is supporting the sustainability of HPC globally.
As the power demand for high-performance computing (HPC) increases, the chilled solution to avoid overheating thousands of servers turns out to be their own wastewater.
The Leibniz Supercomputing Centre (LRZ) in Munich, Germany, is no ordinary supercomputer. Sure, there are thousands of servers, or nodes, stacked in rows in a windowless vault. As technicians look on, they’re all working away on huge data crunching conundrums for research organisations, running simulations to try and better predict future natural disasters like tsunamis and earthquakes.
But it’s eerily quiet. Almost too quiet. The familiar whir of hot air being whooshed away by power hungry computers is almost entirely absent. Where are all the fans?
Almost all gone, as it turns out. The LRZ SuperMUC NG, which uses massive arrays of Lenovo’s ThinkSystem SD650 servers, requires nearly no fans at all – just those for cooling the power supply units and in the in-row-chillers on every eighth row remain.
As a result, “the ambient noise in the datacenter is now lower than in a typical office space,” says Rick Koopman, EMEA Technical Leader for High Performance Computing at Lenovo.
Despite this, Lenovo has been able to keep the LRZ running all this time while overseeing energy reduction levels of 40 percent, greatly lowering the centre’s electricity bill and environmental impact at the same time. “We wanted to optimize what we put into a supercomputer and what comes out of it from an efficiency perspective,” he says.
A green giant
Lenovo has long been a key player in the HPC sector; in fact in 2017 it set a goal to become the world’s largest provider of supercomputing systems as ranked by the TOP500 project by 2020, a target it met just one year later.
At a glance, sustainable supercomputing sounds like an oxymoron. After all, as processors become faster and faster, they require more and more power.
When the company first began working on SuperMUC at LRZ in 2012, typical HPC compute nodes used processors requiring 100-120W (Watts) of power per processor. That figure is now typically over 200W and will increase further to over 300W in 2021. And the greater the Wattage, the more heat that needs to be removed from the processors to keep them at their optimal operational temperature range – typically with the current generation of processors, when the internal processor junction temperature goes over 80 degrees, the silicon in the chips begins to breakdown.
So how do you bring down the energy costs and increase operational efficiency as requirements ramp up?
New ways, sustainable solutions
As power demands increase, the problem worsens, so a new solution was needed. You have to get rid of the heat you generate, but the tried and tested method – fans and air – was no longer enough to efficiently remove heat from the servers.
“The old school way is chilling the datacenter room and using fans to blow the hot air away,” says Koopman. Hence all the noise. But air cooling is far from efficient for the current and future HPC solutions, and as HPC solutions use increasingly dense arrays of hardware, not even workable.
“We’re reaching a point that air cooling is not an option anymore,” he explains. “You can do that up to around 32-36 kiloWatts (kW) maximum with the support of rear door heat exchangers, anything higher than that can’t be efficiently done with air – and 36 nodes in a standard compute rack, each node consuming up to 3000W, is going to bring us racks that require over 90kW power connectivity and cooling. You can’t get rid of the air fast enough there, you would need a hurricane to move it.”
This is where Lenovo introduced warm water cooling for the first time at large scale at LRZ in 2012 and the advantages over air cooling are manifold. The same mass of water stores four times more energy compared to air at a given temperature, and it’s possible to have the water supply in direct contact with all the elements that needed to be cooled, to make the process much more targeted. “The heat transfer to water is just much more efficient,” Koopman says.
Since the water is also contained in a pipe system, it’s straightforward to re-use it over and over. Depending on the location of the data centre and the outdoor temperature, simply running it through heat exchanger equipment on the roof of the datacenter allows the excess heat from the hardware to radiate away.
A three pronged approach
It’s also just one element of Lenovo’s Neptune liquid cooling technology, which approaches datacenter energy efficiency in three ways; warm water cooling, software optimisation (which has delivered over 10 percent additional energy savings by throttling hardware when needed) and infrastructure advances.
This last role is perhaps the most remarkable from a sustainability perspective. For LRZ’s SuperMUC NG, Lenovo has rolled out adsorption chilling technology to effectively create cold water to cool storage and networking racks from warm water.
Fewer chillers are needed for the creation of this cold water and that adds up on a supercomputer scale. But just as importantly, as all datacenters become more powerful, these techniques are applicable in the future elsewhere in the IT industry too.
“The increasing need for power and cooling is going to be a problem for the entire IT industry, not just the supercomputer industry. Wattage of processors, accelerators and other components used in servers is going up. Each and every datacenter will experience this issue.”
And when they do, Lenovo will have the solution ready and waiting for them.