CadenceLIVE Google Keynote: Please Sir, I Want Some Moore

30 Jun 2021 • 7 minute read

may i have more moore please? Google keynote from CadenceLIVE Americas The invited keynote for the first day of the recent CadenceLIVE Americas was by Partha Ranganathan, who is currently a VP and Engineering Fellow at Google where he is the area technical lead for hardware and data centers, designing systems at scale. He actually titled his talk May I Have More Moore, Please? Thinking Outside the Traditional Hardware Box, but I went full-on Oliver Twist to keep the post title to a reasonable length.

Partha started with some history of processor performance. I covered much of the same ground in my three-post series that starts at Domain-Specific Computing 1: The Dark Ages of Computer Architecture. One Google-specific point was its motivation to develop the TPU, that if every Android user made use of voice-recognition for a couple of minutes a day, Google, already the largest hyperscale data center operator, would need to double the scale of its compute infrastructure.

He moved on to security:

So, now we’ve talked about a few examples here. But I also want to bring up one other thing, which I sometimes refer to as negative Moore’s law. And this refers to some recent security vulnerabilities that we have seen, side-channel security vulnerabilities like Spectre, Meltdown, Foreshadow, and what we are finding with these security challenges is that the mitigation to these challenges often requires significant tradeoffs in performance. And not only are we seeing a slowdown in the improvement rate with Moore’s law, as we start getting more aggressive in addressing these security challenges and potentially challenges around reliabilities as well, we are likely to see a negative decrease or regression in performance that we see in our systems as well. So this combination of slowing supply on one side due to technology and other challenges, and this exploding demand that we see is basically the problem statement that we are going to talk about.

If you don't know about Spectre and Meltdown, I wrote about them when they were first revealed. But a better place to start is my post about a panel session from HOT CHIPS Spectre/Meltdown and What It Means for Future Design.

Partha said he would talk about three things:

The very efficient design of new hardware
Being more efficient in the use of that hardware
Disaggregation and software-defined hardware

The Efficient Design of New Hardware

What he explained he really meant was "how do we build custom silicon accelerators?". The new normal is to build your hardware, target it, and customize it to specific workloads.

By doing that, you are going to be very transistor-efficient, area-efficient, and you’re going to be power-efficient, and correspondingly you are gonna be cost-efficient and that gets us to getting to the Moore’s law target that we want to look at.

There are lots of opportunities for doing this, big and small. Another Google example was that Google/YouTube had built its own video encoder chip. Every time someone uploads a video to YouTube it has to be encoded for a range of formats so that it can be replayed on everything from big screens TVs to under-powered legacy smartphones. Each of those requires a different encoding. And you can look around and see companies seizing this opportunity for their own businesses. For example, see my posts:

HOT CHIPS: The Tesla Full Self-Driving Computer
Arm Goes for It (which touches on the Apple M1)
Climbing Annapurna to the Clouds (about AWS Graviton)
And, of course, Inside Google's TPU

Efficient Use of Hardware

But there is a missing middle that Partha calls the "attack of the killer microseconds".

if we think about how computer systems are designed, I would argue that we have the two extremes very well covered. On one hand we have nanoseconds, which is basically at the hotbed level we have the whole bunch of techniques whether it’s pipelining, or out of order execution, or prefetching, or whatever. On the other hand, we have milliseconds, again we have a whole bunch of techniques at the systems level, task scheduling, context switching, and so on. But we are increasingly entering the regime of microseconds, and microseconds happen in multiple ways—so if you think about fast networking across a typical data center, roughly about a microsecond; if you think about the new world of accelerators we are entering, accessing these accelerators across a PCIe or a CXL bus is roughly about microseconds. Or if you think about a new emerging, non-volatile memory technologies, again accessing these memories in order of microseconds. And I think we have some very interesting opportunities to think about, in terms of what does acceleration mean for these new microsecond level challenges. This could include some very interesting optimizations like scheduling in hardware, it could be hardware offloading to optimize for latencies, or it could also be support for asynchronous programming models and so on. So, again, lot of opportunities.

Disaggregation and Software-Defined Hardware

It turns out that Partha and some colleagues at Google have actually written a book about this, The Datacenter as a Computer.

His basic approach to efficiency is to build efficient hardware blocks and compose them in software, so a software-defined infrastructure.

So what you see here in this picture is basically how disaggregation works. You can take an individual server and you can break it up or disaggregate it into individual components and you can start having these pools of systems-pools of compute, pools of memories, non-volatile memory, pools of storage, and so on. And what you now have is an architecture that allows you to compose these in whatever form you want based on whatever use cases you’re looking at, whatever workloads you’re looking at. And the power behind such an architecture is basically that you now provide incredible amounts of efficiency by simplicity and by modularity.

Hardware and Software "in Perfect Harmony"

hardware and software in harmony

Partha thinks that this is where the cloud shines and provides future opportunities for the new era of Moore's Law. In particular, "The cloud gives you a seat in the new Moore’s Law roller coaster, without the twists and turns of constant change."

Next, Partha moved onto "ML for ML" which is machine-learning for Moore's Law, using machine-learning to improve the compute infrastructure, either just to better optimize the compute environment, or to better match it to the workload over time.

He moved on to talk about three individuals from the past whose thinking permeates computer architecture today:

John von Neuman: "I think we have an opportunity to rethink the traditional von Neumann construct on computer architecture, and arguably some of the work with systolic arrays in ML is already starting to do that".
Maurice Wilkes (the head of the computer science department when I was at Cambridge, and my lecturer in both networking and numerical methods): "He is widely credited with modern day cache hierarchy. And again, with technology scaling challenges, with the new innovations, new memory technologies, non-volatile memories, and so on, I think the memory hierarchy is ripe for some significant innovation. And I am excited to see what happens there."
Gene Amdahl: "In addition to the very popular Amdahl’s law of diminishing returns, he is also very well known for the Amdahl’s rules of system balance. And once again, I think we have an incredible opportunity to rethink what system balance means in this new context of emerging workloads, and emerging systems that we talked about".

Closing

So, in closing, I think we talked about how we are entering this new era of Moore’s law, which is defined by these two key dimensions: one is around how we can efficiently design hardware—we talked about accelerators, we talked about the importance of agility, and rich vibrant community and ecosystems to that. And the second dimension was around how do we efficiently use the hardware that we designed and we talked about new architectural concepts like disaggregation, and software defining hardware, and throughout we talked about how we have this cross-cutting themes of using machine learning and cloud computing to amplify the benefits from all the optimizations that we have in the new Moore’s law innovation.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email