• Skip to main content
  • Skip to search
  • Skip to footer
Cadence Home
  • This search text may be transcribed, used, stored, or accessed by our third-party service providers per our Cookie Policy and Privacy Policy.

  1. Blogs
  2. Breakfast Bytes
  3. Best of CadenceLIVE 2020: Hyperscale Data Centers
Paul McLellan
Paul McLellan

Community Member

Blog Activity
Options
  • Subscribe by email
  • More
  • Cancel
hyperscale
cadencelive
digital full flow
ARM

Best of CadenceLIVE 2020: Hyperscale Data Centers

25 Mar 2021 • 4 minute read

cadenceLIVEThere is something in philosophy known as the Sorites paradox. If you have a heap of sand, and you remove a grain, is it still a heap? Well, sure. So, remove another grain. It's still a heap. But if you keep removing sand, does it always remain a heap? Well, no. A single grain of sand is not a heap. Sorites sounds like he should be an Ancient Greek philosopher, but actually, it's just the Greek word for a heap.

 In the same way, I don't think that you can define a clear line between a hyperscale data center and a server farm. However, in a hyperscale data center, there are literally tens or hundreds of thousands of servers, and since these days they are all multicore, that could be millions of cores. It's not just the servers, it is also the communication architecture, which is usually divided up into rack (linking all the servers in a single rack to the router on the top), and the datacenter level (sometimes called the spine), and then linking the data center to other data centers and other client machines.

One of the first companies to build data centers on this scale was Google. Its first servers looked like below. Well, they didn't just look like this, this is an actual 1999-vintage Google server rack that you can see at the Computer History Museum in Mountain View (coincidentally, just near the Googleplex since, like much of Google, it is in an old Silicon Graphics building).

Hyperscale compute is one of the semiconductor technologies that was covered by customer presentations in last year’s virtual CadenceLIVE Europe event.  To help share the learnings broadly, Cadence has consolidated several of the best presentations by key vertical segments: hyperscale, 5G, automotive, and artificial intelligence and machine learning (AI/ML). These solutions feature either customers who are designing hyperscale compute systems, or technology that is directly applicable in solving challenges presented by hyperscale.  The semiconductors at the heart of hyperscale data centers and far and near edge applications require the most advanced design techniques in order to power the innovation the cloud offers the world. These chips feature advanced nodes, large design sizes, massive hierarchies, and power concerns with tough schedules.

  • Application of Tempus Full-Chip ECO to Timing and Power on Large Designs (Marvell)
  • Voltus/Innovus IR-Aware Full Flow: Experience with IRdrop-Aware Placement and Reinforce PG (ST Microelectronics)
  • Complicated Clock Structure Analysis and Implementation with Innovus (ZTE / Sanechips)
  • Advanced Static Low-Power Verification Topics and Methodology (Intel)
  • Innovus Implementation Flow for Complex Hierarchical Designs (Renasas)

Arm at 4GHz

Let's face it, "Arm" and "4GHz" are not usually words you see in the same sentence. Arm made its reputation with low-power cores for mobile applications, with modest clock-rates. But over the last few years, it has made a more serious foray into the data center world. For more on that, see my posts:

  • Arm Goes for It
  • The Start of the Arm Era
  • Liberate Trio on AWS/Graviton2 Instances
  • Designing Chips for Hyperscale Data Centers: IP

At CadenceLIVE Europe, Arm's Stephane Caneau, Olivier Rizzo, Bastien Metsu, and Florian Chailleux, along with Cadence's Ravi Andrew and Prashanth Lingalah presented How We Pushed Largest 5nm High-Performance Arm Core to 4GHz Frequency. The video is presented by Stephane with an appearance by Olivier.

As you might guess from his name, Stephane is French and is based in Sophia-Antipolis near Nice, where I lived for nearly six years.

I don't know which processor this was. Stephane only said, "the largest A-class core" (Cortex-A78?). The previous implementation, done by the original CPU team in 7nm, had achieved 3.2GHz.

Some specific challenges of the process and the design that Stephane called out:

  • High-resistive metal stack
  • Extra-low Vt cells can help close timing, but leakage can get unacceptable if used too generously
  • Very high instance count with over 6M placeable instances
  • Large floorplan with a single clock
  • Many timing-critical RAM-dominated paths
  • Physical design requires 15 days and 100BGB memory

 The runtime for the whole chip was obviously too long to enable experimentation at that level. Instead, Arm adopted a divide and conquer strategy (see the diagram). Arm also had access to the latest pre-release version of the digital full-flow. The big contributors in the Cadence flow that contributed to meeting the frequency goal were:

  • Latest Genus Synthesis with advanced physical techniques
  • Genus iSpatial was one of the biggest contributors to Fmax uplift
  • Enhanced timing-driven placement engine
  • Via pillar aware optimization
  • Physical re-structuring and re-synthesis
  • Enhanced cluster skewing
  • Tighter integration between Tempus signoff and physical design
  • Power optimization to improve leakage without degrading maximum frequency

The graph shows the march to 4GHz. The drops at the start were due to switching to a new PDK and physical library that was more realistic but also more pessimistic in some areas.

Watch Stephane and Olivier's Presentation at CadenceLIVE Europe

Watch All the Hyperscale Videos

 

Sign up for Sunday Brunch, the weekly Breakfast Bytes email