DAC 2022: Day 3

14 Jul 2022 • 12 minute read

On to day 3 and the final day that I will be posting about. My posts on the first two days are:

Machine Learning for Real: Why Principles, Efficiency, and Ubiquity Matter

The keynote on Wednesday was by Steve Teig, CEO of Perceive. His presentation was titled Machine Learning for Real: Why Principles, Efficiency, and Ubiquity Matter. If I had to summarize it in a single sentence, it would be something like "neural networks today are impressive at party tricks but for serious use, everything about them is being done wrong." Let me preface what I'm going to say by admitting that I am not a deep expert on neural networks. I've never used PyTorch or TensorFlow or attempted to build my own network for image recognition.

His opening slide gave his own summary of what the talk was going to be about:

Deep learning could benefit from more thinking and less folklore.
Efficiency matters—throughput per dollar or watt.

Efficiency is the driver of disruption in computing (Moore's Law being the most obvious example). But now with deep learning, we are going from 1M parameters to 1B parameters to 1T parameters. And where are

these ridiculous models running. GPUs, then racks of GPUs, to buildings full of GPUs, to campuses of full of buildings full of GPUs.

The rest of his talk was structured around what he calls "a myth, a misunderstanding, and a mistake". So deep learning is the opposite of efficiency, more like the "anti-efficiency of deep learning". It turns out that these bloated models are also rather untrustworthy. Steve had some examples of adding carefully structured noise to fool image recognition networks. I've covered this before, so if you want to see examples of this then look at my post Fooling Neural Networks.

Another myth is that average accuracy is what matters. But it is easy to achieve this. For example, a model that always says that nobody is breaking into your house is 99.5% accurate since it is rate. But that is obviously completely useless as a model. Better than average accuracy is "don't ever make a big mistake". This should be driven by the loss function being based on severity, not frequency. The training set needs to be based on the amount of unique information. And balancing won't get you there (Steve's example: Do you have people with beards? People with beards who are bald? People with beards who are bald and wear glasses?).

The second big issue is the fallacy that neural networks are expressive. But actually, they cannot represent things with memory. The notion that they are Turing complete comes from assuming that all wires carry infinite precision, not just 32-bits. In fact, the heavy lifting in a neural network is done by the activation function, mostly using ReLU. So they actually have similar power to the Linux grep command (basically, as regular expressions).

The third fallacy, which I found most interesting, is that compressing hurts accuracy. The definition of randomness is the inability to compress the data because there is no regularity. There's a whole discipline, information theory, that looks at this in detail. In fact, learning is a form of compression, discovering regularity. But any regularity can be used to compress the data.

Today we have a lot of ad hoc techniques for compression (8-bit integers instead of 32=bit floating point, for example). Kolmogorov defined the compression possible as the shortest possible computer program that can generate your data. In practice, 100X compression of neural networks is possible in practice.

He had a little chip that was running Resnet-50 with a power of just 10mW.

Steve's conclusion went back to where he started:

Making machine learning into a science
Finding structure in data, so highly compressed models will be more predictive, not less

And a final thought:

Life Science Panel

Actually, the panel was titled What Is the Role of the EDA Community in Future Life Science Breakthroughs?

dac panel on eda and life sciences

This was one of the most fascinating presentations I've attended recently. I will write about it in its own post next week. To pique your interest, here is the closing sentence of Lou Scheffer (on the panel and an ex-Cadence fellow) when he last spoke at DAC:

If you want to improve electronics, don't study electronics. You should study the brain.

Bespoke Silicon: Tailor-Made for Maximum Performance

This is a topic I covered in a panel session earlier this year in my post DesignCon: Bespoke Silicon. John Lee was on that panel, but this time he was the moderator. Prashant Varshney of Microsoft's Azure was on both panels. Cadence's Kam Kittrel (and an ex-Ambit colleague of mine going back nearly 25 years) was the second panelist, recently promoted to vice-president of product management for digital and signoff. And this time the third panelist was Mathew Kaipantatu of...well, his company asked that his affiliation not be published. I'll honor that even though I could read it off his badge.

The term "bespoke silicon" is like a bespoke suit. It implies it is done specially rather than off-the-peg like an x68 microprocessor. I like the term since "custom" would work, too, but that word also gets used in the sense of handcrafted layout done in a layout editor like Virtuoso, which bespoke silicon (typically) is not. As usual, when I cover panels like this, paragraphs that start "Q" are questions asked by John the moderator. Anything in [brackets] is my commentary, not something the panelists said.

Q: A good example of bespoke silicon is Apple's chips. But why is it happening?

Prashant: Semiconductor development work goes from low volume to high volume. When you look at high volumes, like data center, everyone is looking at how to optimize on a per $ basis. When people start to look at the entire system, and then look at the pieces that go into it, that’s the inspiration to start to look at custom silicon to optimize the cost down.

Kam: It is an interesting change driven by the hyperscalers since the economics has changed. The function for them is to provide a software service with highly scalable hardware. Before, semiconductor companies would make a chip and sell it to the system companies. Annapurna was one of the first to build their own chips and it turned out to be replacing the hypervisor [software] and offload into hardware. We see this in other markets too, most obviously mobile. 5G and AI have changed the game and many people are building their own AI architectures and hardware to drive a specific software stack.

Q: Why isn’t everyone doing bespoke silicon?

Mathew: It’s not simple, you need to be sure it will hit performance, and you’ll have to balance against the rewards. Is it worth it? It makes sense for many people. You also need to worry about timelines, bespoke silicon is made for niche markets and you need to make sure you can deliver on time for the market.

Q: Tesla has designed some customer silicon that they’ve talked about publicly. What about other auto manufacturers?

Kam: Edge device, a car. A data center for processing the data. And machine learning AI accelerators. I’m not sure everyone has to follow, but probably some interesting partnerships. If you are, say, BMW, do you build data centers all over the world or partner? But I think more people will follow Tesla's lead.

Prashant: Most companies pursuing autonomous driving are doing their own bespoke silicon. It is possible that there might be some standardization over time, but for now, we are not there yet.

Q: Where do ASIC companies fit into this? They have been building silicon for clients for many years.

Mathew: Even someone doing bespoke silicon might not do it all the way from RTL to layout, maybe just the RTL and then have someone else do the rest. That's where the ASIC companies might come in.

kam kittrel Kam: I’m old enough to remember when ASIC was dead, and ASIC companies struggled. Now their problem is they don’t have enough people to do all the work. It’s staggering how many advanced-node projects are going on right now. AI chips are in the wild-west of AI architecture since it’s tied to the specific task to be performed. There will probably end up being some standard architectures for AI like there are for CPUs and GPUs.

Q: If you think about large players, some have focused on merchant silicon like x86 and GPUs. Bespoke silicon seems contrary to that trend. Is there room for both?

Prashant: Definitely room for both. Every platform starts with merchant silicon and then, depending on market size, people see how to optimize.

Kam: This plays into 3D-IC and chiplets. It is really expensive to do a high-end CPU. It maybe makes sense to add value to other parts of the system.

Mathew: There is a need for general-purpose compute, too. Bespoke doesn’t make sense for all applications. Systems may mix general purpose and bespoke, perhaps in advanced packaging.

Q: Are chiplets the new form of hard IP? Where are we on that journey?

Mathew: One thing is, say, a chiplet family with five chips. But to build the next version, you don't need to rebuild all five, maybe you replace just one with bespoke silicon.

Q: For bespoke silicon, a lot of designs are using the most advanced technologies. What are the challenges?

Mathew: A big challenge for advanced teams getting started is that you have no pre-existing data, so the first time you do bespoke silicon, many people are doing it for the first time.

Kam: One of the biggest challenges we see is having enough engineers to do the job. Plus it’s a lot different doing 5nm than 28nm. That’s been a big challenge. Everyone is headcount limited at this point.

Q: Programmable logic, such as FPGAs. How will programmability and programmable logic such as FPGAs play into the future? Both Intel and AMD have acquired FPGA companies [Altera and Xilinx].

Prashant: Programmable has improved performance a lot in recent years. There are combinations with GPUs embedded on the same chip. That is giving designers the ability to run through different design styles and algorithms. The gap between FPGA and optimized ASIC is narrowing.

Kam: FPGAs are now targeted to specific applications, not just a sea of gates. The big downside is always power, though.

Prashant: Another application is secure silicon where you have confidential algorithms embedded. You don’t want to implement it as an ASIC.

Q: Software fits into this, so how do the hardware and software teams work together?

Mathew: A lot of bespoke silicon is just there to run the software, to take software that was running on general-purpose compute and take it to the next level. So there is a need for strong teamwork between hardware and software. This is not easy since they speak different languages.

Kam: The hypervisor example from AWS is something we may see more of, taking what was running in software, pushing it down into hardware, so it ran with lower compute per watt.

Q: If you look at bespoke silicon, a lot of the teams are just getting going. Are they using the best methods?

Kam: They are not novices. Even if the company has not done silicon before, they hire experts. Those experts tend to be risk takers pushing the envelope.

Mathew: You are really pushing PPA if you are doing bespoke silicon, otherwise there is no point. But you often address a niche market so you also need to be able to go back and make quick changes.

Prashant: A lot of bespoke silicon is being done by companies who have not done silicon before, but then they hire experts in the field, so there is a compromise between agility and traditionalism from the industry coming in.

Q: There is a global war for talent. Has that been affected by the bespoke silicon trend.?

Mathew: It is very different to create a chip for a mobile phone versus a laptop or a car. There are more opportunities being opened up for domain experts to move into hardware.

Prashant: The ecosystem is making it easier. EDA has evolved over the last two decades, so now the flows are largely automated. Same with infrastructure and what you can do with the cloud, rather than needing to raise $10M and build server farms. The ecosystem is a lot more friendly and is lowering the bar for entry.

Kam: In the last few years, emulation sales have been incredible. Teams need to get to the functionality they need, and that includes software.

Q: Historically, the semi industry has been cyclical. Given where we are with bespoke silicon, what will the future look like?

Kam: The bespoke silicon trend will keep going but shift focus. I think that for AI chips there will be a shakeoout. Maybe in networking, too, maybe some barebones networking designs that become widely used. But as we get more compute moving to the edge then the bespoke silicon will play there.

Prashant: At end of the day, it is a return on investment. There are going to be application segments that can be served with something more application-specific, but everyone doing their own might not be the answer. There is probably a middle ground where companies become centers of excellence.

Mathew: There have been lots of success stories with bespoke silicon so that encourages it. But not everyone will choose to or be able to go that route.

Q: What is one thing that will become part of semiconductors in the next five years that is surprising?

Prashant: Domestic manufacturing. I think will be a big deal. The US government is laying a huge emphasis on this. The $52B CHIPS act is a big huge trend. Also, the industry is starting to look at secure silicon with all the attacks that have been happening. How do you develop secure silicon in a secure matter. All commercial applications are vulnerable.

Mathew: It may not be surprising, but performance per watt will become more and more important. Previously driven by thermal but in the future by sustainability.

Kam: If I push out 15 years or so then non-silicon compute. Quantum, obviously. In five years, the security angle is becoming more and more important.

Q: My prediction is that in five years this panel will all be AI bots!

And with that, the red light blinking finally brought the panel to a close.