• Home
  • :
  • Community
  • :
  • Blogs
  • :
  • Breakfast Bytes
  • :
  • Thermal in Data Centers

Breakfast Bytes Blogs

  • Subscriptions

    Never miss a story from Breakfast Bytes. Subscribe for in-depth analysis and articles.

    Subscribe by email
  • More
  • Cancel
  • All Blog Categories
  • Breakfast Bytes
  • Cadence Academic Network
  • Cadence Support
  • Computational Fluid Dynamics
  • CFD(数値流体力学)
  • 中文技术专区
  • Custom IC Design
  • カスタムIC/ミックスシグナル
  • 定制IC芯片设计
  • Digital Implementation
  • Functional Verification
  • IC Packaging and SiP Design
  • In-Design Analysis
    • In-Design Analysis
    • Electromagnetic Analysis
    • Thermal Analysis
    • Signal and Power Integrity Analysis
    • RF/Microwave Design and Analysis
  • Life at Cadence
  • Mixed-Signal Design
  • PCB Design
  • PCB設計/ICパッケージ設計
  • PCB、IC封装:设计与仿真分析
  • PCB解析/ICパッケージ解析
  • RF Design
  • RF /マイクロ波設計
  • Signal and Power Integrity (PCB/IC Packaging)
  • Silicon Signoff
  • Solutions
  • Spotlight Taiwan
  • System Design and Verification
  • Tensilica and Design IP
  • The India Circuit
  • Whiteboard Wednesdays
  • Archive
    • Cadence on the Beat
    • Industry Insights
    • Logic Design
    • Low Power
    • The Design Chronicles
Paul McLellan
Paul McLellan
10 Dec 2021

Thermal in Data Centers

 breakfast bytes logo You've probably heard that datacenters worldwide consume a huge amount of energy, and they do. The numbers vary depending on the source (and the data center) but a round number is that a data center consumes 100 MW. This number comes from the Energy Innovation (EI) report How Much Energy Do Data Centers Really Use? and the chart at the start of this post comes from there, too, which shows that, to a first approximation, half the power goes to the equipment in the data center, and the other half goes to get the power into the building and the heat out.

My favorite example of this is the Google data center at The Dalles in Oregon. The story that makes it fascinating is that there used to be an aluminum smelter there. You perhaps know that aluminum is refined from bauxite using vast amounts of electricity, so smelters are always located near a source of lots of power such as a hydroelectric dam. The smelter in Oregon closed, meaning that there was a lot of power available, so Google built a data center there. Plus, it is on the Columbia River, which gives an effectively infinite source of cold water for cooling. It is one of the best examples I know of Schumpeter's creative destruction: 20th-century aluminum smelter replaced with 21st-century hyperscale data center.

How Much Is Global Data Center Power Growing?

The natural assumption, since we all know that more and more data centers are being built across the globe, is that the amount of power consumed in total by data centers is growing fast. I've seen speculative graphs showing that the entire world's electricity supply will be going to power data centers (or Bitcoin miners!). Or, as it says in the EI report:

As the number of global internet users has grown, so too has demand for data center services, giving rise to concerns about growing data center energy use. Between 2010 and 2018, global IP traffic—the quantity of data traversing the internet—increased more than ten-fold, while global data center storage capacity increased by a factor of 25 in parallel. Over the same time period, the number of compute instances running on the world’s servers—a measure of total applications hosted—increased more than six-fold .

 But it turns out that these concerns were voiced by people who don't know anything about semiconductors, EDA, Moore's Law, and everything that goes into making chips lower and lower power. The total power consumed by data centers globally is something like 200TWh/year. But that is not much different from a decade ago. Again from the EI report (as is the bar chart to the right):

However, new results from the bottom-up perspective indicate otherwise: Despite rapid growth in demand for information services over the past decade, global data center energy use likely rose by only 6 percent between 2010 and 2018.
...
The finding that global data centers likely consumed around 205 terawatt-hours (TWh) in 2018, or 1 percent of global electricity use, lies in stark contrast to earlier extrapolation-based estimates that showed rapidly-rising data center energy use over the past decade.

In one sense this is quite remarkable. On the other hand, during the period from 2010 to 2018 (the latest numbers that seem to be available), semiconductor process technology went through several nodes. Remember, in 2010, 28nm had not yet been introduced. By 2018 it was 7nm. Each node results in a power reduction of around 30% meaning that the same functionality in 2018 could be provided for about 15% of the power in 2010. That's one way that it has been possible to have explosive growth in internet traffic, data centers, cloud computing but continuing to use only about 1% of the world's electricity.

Measuring Power

Another change between 2010 and today is that we have much better tools for measuring power when designing chips. One stage beyond simply holding your finger in the wind is to use Excel. And back in 2010 that was pretty much how designers of chips, designers of servers, and designers of entire data centers were doing things. This tended to result in very pessimistic designs, since the one thing you could not afford to have is chips going into thermal meltdown, or whole racks overheating.

It is a hierarchical problem, of course, a bit like "The House that Jack Built". A data center consists of rows and rows of racks. Each rack typically contains a large number of servers with some sort of router on top. The servers are often referred to as "pizza boxes" since that is roughly the size and shape. Inside, instead of pepperoni, there are processors, GPUs, and other chips. These are the source of all the heat. So any approach to improve the accuracy of estimating the power of a rack (and on to a whole data center) starts with accurately calculating the power dissipated by each chip.

Voltus and Celsius

The key tools for this in the Cadence portfolio are Voltus and Celsius. If you want to dive deeper into this, see my post Celsius and Voltus: 2+2=5. The primary job of Voltus is to calculate the IR drop everywhere on the chip, but it also produces Voltus Thermal Models (VTM). The VTM contains on-die materials properties, metal densities (since metal conducts heat as well as electricity), static and transient power information, and temperature-dependent power.

But that is just a starting point. Chips go in packages, and packages go on boards. So some of the heat from the chip is spread out since packages and boards conduct heat as well as conducting signals. And there are other thermally important objects around, such as heat-sinks, heat-pipes, fans, and more. This is where Celsius comes in. For a deeper dive on Celsius, see my posts Celsius: Thermal and Electrical Analysis Together at Last and Under the Hood of Clarity and Celsius Solvers. Celsius actually consists of two different underlying solvers. One is finite element method (FEM) based for handling conduction through the board and, optionally, radiant heat. The other is computational fluid dynamics (CFD) based and handles everything to do with fluids, typically airflow but also the fluids involved in so-called water cooling (which actually typically uses other fluids).

Voltus and Celsius get you to a very accurate analysis of a single pizza box. These can then be summed up to get a whole rack, and racks summed to a data center. Of course, the analysis is also used to plan where to place objects like fans, baffles, and heatsinks.

Cadence presented an example of this sort of analysis of a pizza box at CadenceLIVE, which I then covered in my post Thermal Analysis of Protium X1. Here's an image from that post showing the analysis discovering a problem:

 Of course, Protium X1 has been superseded by Protium X2, and you won't be surprised to know that we used the same approach to thermal analysis. And for the Palladium Z2. Both of these are rack-mounted systems intended to be installed in datacenters. For more about Protium X2 and Palladium Z2, see my post Dynamic Duo 2: The Sequel. Cadence doesn't design actual servers, we buy those like everyone else, but Palladium and Protium face similar challenges both at the local level, where adjustments to the cooling may be required, and at the global level which measures how much heat does the system dissipates when fully loaded, and which the HVAC has to get out of the building.

Learn More

See the Voltus Power Integrity Solution Product Page.

And the Celsius Thermal Solver product page.

 

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.

.

Tags:
  • Protium |
  • Palladium |
  • Voltus |
  • thermal |
  • power |
  • datacenter |