DAC Wednesday: Denali, Patterson on Architecture, Rowen on Deep Learning, Analog Reliability Lunch, Bagpipes

28 Jun 2018 • 13 minute read

Tuesday evening finished with the Denali Party. Since it's 8 years since Cadence acquired Denali, you may have forgotten who they were. They were experts on DDR memory interface IP and verification IP, and was the first IP that Cadence added to its portfolio. The Denali Party goes back to 1999. If you want to read about how it started, read my post Party Like It's 1999—How the Denali Party Started.

Dave Patterson Keynote

Dave Patterson gave a version of his recent Turing Award acceptance presentation, A New Golden Age for Computer Architecture. He received the 2018 Award along with John Hennessy for the invention of RISC. For details of that, see my post Hennessy and Patterson Receive the Turing Award.

His keynote was divided into three parts. The first part was "50 years of computer architecture in 15 minutes." It was a shortened version of his keynote from the RISC-V Symposium in Shanghai last year, which I covered in:

The era of RISC truly began in 1983 when RISC-II (from Berkeley) and MIPS (from Stanford) presented at 1983's ISSCC, when it was suddenly clear that:

A group of grad students at Berkely and Stanford could design a better microprocessor than the industry.

Victory was total. There hasn't been a new CISC instruction set proposed since that day, and even x86 is implemented at RISC under the hood to take advantage of everything learned while preserving binary compatibility.

The second part of Dave's presentation looked at the challenges. We have long passed the point that Dennard scaling would give us faster and faster processors without having to pay anything in power. And Moore's Law has slowed to a crawl (for processors at least). We stopped being able to ramp up the clock rate, and switched to multi-core. But, due to Amdahl's Law, there is a limit to how many cores can be used effectively. We have also run out of architectural tricks to improve performance for general purpose computing.

The other challenge is security. At various times we have had strong security built into processors, (we had a capability machine at Cambridge when I was an undergraduate forty years ago) but since operating systems didn't use them in general, they stopped being included. With microprocessor architecture:

We didn't worry about timing provided we got the right answer. But it turns out that timing differences can leak information. Spectre leaks 10 kb/s...I thought it would be more like one bit per century

You can read more about the Spectre and Meltdown vulnerabilities in my posts:

Security is an embarrassment to the computer architecture community. So those are the two challenges, the end of general purpose speed up, and completely broken security.

Dave wrapped up part two with a quote from John Gardner in 1965:

What we have before us are some breathtaking opportunities disguised as insoluble problems

In part three, Dave looked at the opportunities. He started with looking at a simple problem, matrix multiply written in Python. Modern scripting languages like Python are very efficient for programmers (easy to write code) but not efficient for execution. Rewriting it in C gives a speedup of 47X. Parallelizing the loops (pretty easy for matrix multiply) takes that up to 366X. Optimizing memory handling to 6727X, and using x86 SIMD (vector) instructions to 62,000 times faster. Matrix multiply is maybe a best case, since it is easy to optimize, but Dave pointed out that if you only get 1000X or 100X, that would be huge.

Domain-specific Architectures (DSAs) can win by tailoring the architecture to the domain. One big important domain is machine learning. The number of papers produced on machine learning is growing faster than Moore's Law with 50 papers a day right now. Google built their own chip, the Tensor Processing Unit (TPU), which has been used by billions of people to speed up search, voice recognition...and beating the Go world champion (for more on that, see my post Deep Blue, Alpha Go, and Alpha Zero). The TPU has peformance about 30-80 times better in performance per watt than a CPU or GPU.

Next Dave switched to open instruction sets. RISC-V was started as "a three-month project" to design an ISA for teaching courses since there were too many restrictions on the only other "obvious" choices, Arm and x86 (technically the AMD 64-bit ISA). Dave told a bit of the story. But you can read about Krste Asańovic's presentation covering the whole story from DAC 2016 in RISC-V: Instruction Sets Want to Be Free.

RISC-V is a great base for security work, since it is open and it is simple. Complexity and obscurity are the enemies of security. With a RISC-V implementation, it is much more straightforward to prove there are no trapdoors.

Finally, Dave talked about how to design these DSAs with agile hardware development. The above table shows just how few lines of code are needed for even a fairly complex processor (BOOM is an out-of-order execution processor) and how much code can easily be reused. For more about that see:

So all in all, it is the start of a new golden age for computer architecture. These are not insoluble problems but a breathtaking opportunity.

Also, let me put in a plug. If you have read Breakfast Bytes over the last couple of years, you have already seen: the Turing award, 50 years of computer architecture, the RISC-V ISA, Alpha Go, agile hardware development, Specre & Meltdown, and more. You already know almost everything that Dave talked about.

Chris Rowen

Chris Rowen, who was the founder of Tensilica (which Cadence, of course, acquired). He is now the founder of Babbelabs, although most of what he talked about in his presentation is more related to Cognite Ventures, his open resource on AI startups. As it turns out, Chris had already made an appearance since he was also a founder at MIPS, and appeared on a slide in Dave's keynote.

There is a lot of hype about AI, with one page hit on Google for every person in the US, India and China. There are 11,300 AI startups. There are 16,500 neural network technical papers, almost all of them in the past 24 months. Chris said he has never seen a technology go from zero to light-speed so fast.

Since it was DAC, he also took a look at the impact in the semiconductor universe. He had three areas tidentified:

Using machine learning in EDA, which has moderate impact. There are many tasks where statistically close is good enough, such as floorplanning, or synthesis parameter discovery.
EDA in machine learning chip design, which he thinks has just a small impact. Machine learning chips just aren't that different from other complex chips
Design of machine learning systems, with a big impact. Deep learning is really a new computing model. "The neural network is the new circuit" and we can apply EDA-like tools to these networks and arrays.

One reason that vision is so central to AI is the pixel explosion. There are 20B image sensors out there, more than there are people obviously. So there aren't enough people to look at the images. 99% of all new (raw) data is pixels (and essentially almost all the rest is audio). Increasingly, in things like autonomous driving, the data is being looked at by neural networks to make decisions (about lane changes, or whether a person on a security camera is a threat).

One way of processing voice (or sound in general) is to turn it into a spectogram and then use vision processing techniques on the spectrogram. The same issue with cameras exists with sound, since there are also 20B microphones out there. Chris showed an impressive noise reduction video which Babbelabs had made.

A look at deep learning startups shows that 2/3 of the 264 focus on the cloud. Half do vision. Embedded startups are dominated by vision with over 90% there. Speech is only just starting but is an order of magnitude smaller. There are lots of deep learning chip startups, over 25 serious ones, unlike anything we've seen for 15 years or more. Half are in the US (mostly California), then UK, Israel, China and Canada, and then a long tail.

Chris's success formula for an AI startup:

Access to big training data and tools
End application know-how
Exploitation of parallel compute potential (in cloude or in multicore processors)
Get to market quickly...and often

At one level, deep learning silicon is easy, especially for inference. But it is also hard since there are so many impediments to efficiency, such as power limits, memory bandwidth limits, and so on. He feels that "silicon availability may be getting ahead of deployable applications." Here is his chart showing where all the silicon startups are playing:

Echoing Dave's keynote from earler, Chris pointed out the enormous improvement over standard processors and the breadth of applications in device vision and speech. This is a new computing model, no longer anything like a Turing machine. Chris said that new tools for AI is huge opportunity for EDA-like algorithms to be delivered as tools to support this fundamental change in software development methods.

Analog Lunch

After verification on Monday, digital on Tuesday, it was time for the Wednesday Analog Lunch. Steve Lewis moderated a panel with:

Elyse Rosenbaum from U of Illinois
Mark Porter of Medtronic
Saverio Fazzari of DARPA
Lakshmanan Balasubramanian of Texas Instruments
Vinod Kariat of Cadence

The title of the panel was Meeting Analog Reliability Challenges Across the Product Life Cycle.

One thing all the panelists agreed on is that digital is way ahead of analog in terms of having a standard test methodology that is part of the flow, having requirements for measuring whether test and reliability requirements are met, and so on.

The participants all had slightly different concerns.

Elyse works mostly on ESD issues, which are actually pretty well understood. We have machine and human models, we know what the input waveform is, and we can simulate it and see if what mitigation we have in place works. Big digital chips, which have lots of power domains, are a sort of worst-case for ESD since they all need separate protection since the grounds are isolated. Automotive brings new issues, such as RF interference.

Mark (Medtronic) said that they build implantable medical devices. Their environmental issues are simple, since the body doesn't vary much in temperature. Their big issue is that they have to run at extremely low power. The voltages are so low (frequencies too) that they don't really worry much about wearout of transistors even of the extended lifetime. Their big issue is screening out defects.

Also, when we talk about low power, we are much lower by orders of magnitude below standby current for most devices. We can’t just hang test circuitry, because it adds capacitance, and so drains power. We don’t operate at super high frequencies, so the mechanisms that cause transistors to go bad don’t affect us. But screening out latent defects is the big issue, to make sure we do get a functional test out, so we can force a failure early and don’t ship in our products.

Saverio (who said he was more there as a representative of Booz-Allen more than DARPA) said that they have a different set of problems. Parts have to last 20 years. But their volumes are so small that statistical techniques don't apply, there isn't enough data. So they need better ways to be able to rely on tools that they can leverage to analyze the reliability problem.

Lakshamanan (TI) said they have lots of chips with lots of digital and lots of analog, and RF. So even testing (or simulating) for basic functionality is a problem. ESD is a big problem since they usually have 10-15 power domains. Five years back, designers didn't even know what they needed to do for reliability. The functionality is clear, but reliability challenges need to be formalized and specified so they can the checked. Things like noise-coupling of unexpected behaviors.

Vinod though that these issues are growing in importance for a couple of reasons, since the basic issues are not new. One is the number of designs has gone up a lot with IoT, which all have analog and need to be reliable, and many of which live in hostile environments. The other is ADAS and autonomous vehicles, which has increased the focus on safety-criticality and is driving Cadence to create and deliver commercial solutions.

A big issue is latent defects that several people talked about. Digital has ways of running tests at higher voltages which shakes out issues that might turn into defects later. But analog needs something similar to shake out latent defects that may become an issue later. That requires better models for performance reasons...but it is clear analog designers are very conservative and don't trust fast models. So there is a chicken and egg problem to solve there to. Better models also require better data from the foundries, which they are reluctant to release.

Steve's next question was open-ended, asking each panelist what was the #1 thing in a tool that would add more toe reliability. What is "the" thing?

Mark (Medtronic) said that:

for me, there is one main problem. The Latent Defect. The digital guys have a way of doing an automated flow, we can raise the voltage, and screen out latent defects. No automation exists in the analog world, and that is the biggest thing that affects us in the field. If we had a way to generate the vectors and raise the voltage and find the breakdowns that would go a long way.

Elyse (UofI) For me, I see the biggest challenge as developing a modeling and simulatkon capability for system level ESD analysis for automotive. It is critical and it is missing.

Saverio (DARPA):

Take two steps forward, latent defects are what kill you later. Digital guys have a strategy. It would be nice have the first step to get to that for analog. I have old analog designs without any data, how to recreate the behavior not just functional perspective but reliability.

Vinod, who had mostly been listening, got the final question, which was basically what is Cadence going to do about it?

We have the fundamental components, defect analysis, thermal analysis, aging models, they are the right building blocks. Implementing more and more efficient way of analyzing so you don’t spend a year in simulation is a good step. Other things that are handled in piecemeal fashion, EMI , ESD etc, which are separated out and handled by experts. “That guy in the lab that does EMI compatibility testing and if there is a problem he’ll stick more metal around it or something.” Defect modeling whether for better testing, or better understanding fault mechanisms to build in functional safety. Improving that is going to be important and automotive is driving (see what I did there?) it.

Machine Learning and the World Cup

Here is the MIT Technology Review on machine learning, since that seems to be today's topic. Using sophisticated algorithms, they predict that Germany will win the World Cup. Err...I predict they won't even make it out of the group stage.

Bagpipes

I talked above about the Denali party, a tradition Cadence inherited and then kept going. Another tradition is the bagpipes that end DAC. It used to be HP in the workstation era, who played it really loudly on their speakers. Somehow Forte picked it up when workstations lost out to x86 servers. Cadence acquired Forte and so here we are.

forte bagpipes Or watch the video. Amazing Grace is actually a song about a guy in the slave trade who decided that was maybe not something he should be doing. Read the lyrics sometime. If you want to see the most amazing version of this, then go to the Highland Games on Labor Day weekend, organized by the San Francisco Caledonian Society, in Almaden County Fairgrounds. You will see a solo piper play the first verse, and then 600 pipers will start. It will bring tears to your eyes.

Watch the video from last night:

Sign up to get the weekly Breakfast Bytes email: