DAC Tuesday: IBM's AI, Jay's Wall Street View, Lip-Bu's Chat, Monster Chips

27 Jun 2018 • 14 minute read

The second day of DAC needed several clones of me at lunchtime. Lip-Bu Tan's turn in the DAC pavilion overlapped with the digital lunch, which overlapped with a presentation by DARPA on silicon compilers, and a Cadence panel on cloud in the Cloud Pavilion. So I didn't manage to cover all the things I'd have liked to. But here's what I did see.

Dario Gil of IBM

The keynote to start Tuesday off was by Dario Gil of IBM. He is VP or AI and IBM Q, which has to qualify as some sort of a record for a job title with letters rather than words. His talk was titled The Future of Computing: Pushing the Limits of Physics, Architecture, and Systems for AI.

Dario started off by saying that:

AI is the new IT. Yes, there's a lot of hype, but it is the reality that it is the most important thing in computing today.

MIT's introduction to machine learning course is on a sort of Moore's Law with attendance doubling every couple of year. 128 in 2013, 302 in 2016, and 700 in 2017.

The foundation of AI was in the 1900s with Santiago Ramón y Cajal, who pretty much invented neuroscience and discovered a lot of the microscopic structure of the brain. In 1940 the first neural nets, with just a few neurons, of course, we created. But the idea of forward propagation (for inference) and back propagation (for training) was already there.

Then not much happened. There was a lot of interest in AI in the late 1980s and early 1990s, all rule-based, not neural network based. But that didn't really go anywhere and there was another AI wineter. In 2012 there was suddenly a deep learning explosion. Computers were finally fast enough and there was a huge amount of data. YouTube uploads 400 hours of video per minute; Walmart generates 2.5 petabytes of customer data every hour; Facebook has 350M images uploaded every day. The important thing was that features no longer had to be extracted by human engineering, they could be extracted automatically. Neural nets are built using a training phase to calculate the weights and a forward inference phase. At IBM all this stuff is on servers, but here in the semiconductor and EDA worlds, the most important thing is to be able to optimize the weights and do inference on the chip under a reasonable power envelope.

Today, the state of the art changes almost monthly. Software is eating the world...but AI is eating some aspects of software.

Because it was DAC, Dario showed some work IBM was doing on AI power design automation. IBM builds high-end server microprocessors (that's what a mainframe is these days). Each chip is 1000 person-years, so hundreds of millions of dollars. Until now, the only way to cope with increased complexity has been increasing the size of the team. But with a system called SynTunSys they use ML to automate the parameter tuning for synthesis, and capture knowledge from experts. As a result, once their experts had finished the design, SynTunSys improved total negative slack by 38%, latch slack by 60%. These are big numbers, and they are not on an unoptimized design, but one that the experts had done their best with.

One thing Dario emphasized is that general AI is not close. He said it was 2050 or later, but as he immediately admitted, "when we put numbers like that it means we have no idea."

One big challenge going forward is explainability. We need to create AI that is less of a black box, so we can have debuggers and understand why decisions were made. Today's AI is very fragile and it is easy to do these tricks where you imperceptibly alter an image and it suddenly says an airplane is a zebra (see my post Fooling Neural Networks for more on that). Another challenge is reducing the amount of data we need for training. It's great to learn by example, but why do we need 10,000 or 100,000 images. See my post Embedded Vision Summit: "It's a Visual World" for details but the money quote is that if you take a toddler to the zoo and say "that is a zebra' then the toddler is trained and doesn't need the other 9,999 images. AI is nowhere close to doing that.

Another challenge is combining learining and reasoning, combining modern AI with the techniques that started to be developed in the 1980s and 1990s.

He talked a bit about bias. As it happens, I have a post on bias in embedded vision (which is pretty much the same thing) coming up next week.

On to the computing infrastructure for AI. He divided this into 3:

GPUs, algorithms, and architecture. Basically, using modern semiconductor technology the best way we can, reducing precision (which is surprisingly easy, 8 bits is almost as good as 32-bit floating point).
CMOS but with new features, like memory in the interconnect (phase change memory, resistive ram).
Quantum computing.

I am going to admit that for the Nth time, I've had quantum computing explained to me, and qubits...but I still don't understand it. I actually understand the entangled particle stuff, I even did all the math as an undergraduate although about all I can remember now is Schrödinger's Equation and the Heisenberg Uncertainty Principle (which does not say that you can't measure something without changing it, in the classical sense).

If you want to try this, IBM has many quantum computing devices, cryogenically cooled to almost absolute zero. Three of these they make available open-source style, and anyone can use them to learn. "Give it a try," we Dario's final words.

Jay Vleeschhouwer of Griffin Securities

When I worked for Cadence in the early 2000s, I was the house-trained technical guy that could talk to investment analysts and know what I could and could not say. I often ended up talking with Jay Vleeschhouwer who then worked for one of the big guys (Merrill-Lynch?) but now works for Griffin Securities. I like to say I've known him for long enough that I can spell his name. Who has a double-h in the middle of their name? He gave the view from Wall Street presentation to kick off the morning at the DAC Pavilion up on the second floor. Today I met someone who asked me where a booth was he couldn't find, and I told him it was upstairs. He hadn't discovered the expo is spread over two floors this year.

Since some of this will be about Cadence, let me state that this is all Jay's opinion, not mine, and I may or may not know some confidential internal stuff that I'm not revealing here. But I don't want to preface every statement with "Jay thinks".

So if you want to feel good about EDA (and, who wouldn't?) then here is EDA growth over the last decade or so. TTM doesn't mean "time to market" but "trailing twelve months" that evens out the quarterly ups and downs, especially Q4 to Q1. The big dip is the recession and the reset of Cadence after the Fister years.

Jay doesn't just follow EDA, he follows a lot of software companies and he says there are two big arms' races going on around software development (Microsoft just acquired Github, for example) and silicon development (Microsoft, Amazon, Google, Facebook, Apple are all doing more and more silicon development).

One datapoint. The combined market values of Cadence and Synopsys (pre-DAC) are 26.3B or 5X estimated combined revenues. The EDA industry grew 9% in 2017 to $7.03B, and for 2018 Jay estimates 4-4.5% growth to $7.29 to $7.34B. There was lots of emulation upside in last year's growth (revenue is recognized the moment an emulator leaves the loading dock, under FASB rules). He thinks the profit margin will increase.

The big phenomenon of the last decade is the rise of what Jay calls "non-Japan Asia" and the rest of us call "China". Well, there is growth in India and in other countries too, but it is mostly a story about China. Asia is now more than Japan and Europe combined, and so is a crucial market for EDA companies. It is not just EDA, it is a growing market for IP too.

Cadence's book to bill has been bigger than Synopsys over the last few years, so that shows share gain. Book to bill is the ratio of bookings (new orders) to billings (revenue). Traditionally, if this number is less than one, it is called "negative" even though it is actually still positive. But it shows a shrinking business, getting fewer new orders. One amazing statistic is that Intel accounts for 16% of Synopsys revenue historically, and in 2017 grew to 18%. All other companies are mid-single digits at most, so this is a unique situation. Intel spend with Synopsys was up $100M in 2016, and up another $100M in 2017, most of that driven by hardware. Even so, Cadence position at Intel improved as well.

Jay also looks at job openings at the EDA companies and uses it as leading information about, for example, growing sales forces in Asia. A few years ago a sharp falloff in openings made Jay worry and a few weeks later they had a major layoff. Currently, openings are strong across the industry, meaning that everyone has a positive feel about where the industry is going with a willingness to invest and ramp-up.

Jay is positive too:

It is a atter of public record we have published ongoing positive recommendations for Cadence and Synopsys (and peers elsewhere in engineering software, which are getting more inter-related…that’s the meaning of Mentor/Siemens and Ansys/Apache).

Someone asked about IP. Jay said that IP has been growing more quickly than core EDA. Synopsys combines IP, HAPS and Software Integrity, but Jay has reverse engineered the numbers, and it is mostly IP. IP has also resumed growth for Cadence. IP has been especially important in share gain for Synopsys in Asia.

In reply to a question, Jay said that the top 3 have grown from 75% to 85% of the whole industry, through a combination of growth and acquisitions. If you add the EDA part of ANSYS, top 4 are over 90% of the industry. Altium, Zuken, Keysight are next. EDA companies, especially Cadence, are more focused on internal development. There are fewer startups, partly due to lack of IPOs, and partly because there are lots of small niches in EDA (which are unexciting to investors) and big sectors (where one or more of the big guys have a strong market position).

Straight Talk with Lip-Bu Tan

Next up was Lip-Bu Tan talking to Ed Sperling. I will cover that in its own post in a week or so.

Lunch with Monster Chips

Yesterday it was the verification lunch. Today it was digital. The panel was moderated by my SemiWiki ex-coblogger Tom Dillinger. The panels was:

Antony Sebastine of Arm
Anthony Hill of Texas Instruments (TI)
Patrick Sproule of NVIDIA
Anand Sethuraman of Broadcom
Chuck Alpert of Cadence

I won't try and cover everything that was said but pick a few key points that I thought were surprising or significant.

Tom asked a basic nuts and bolds question as to whether the tools were scalling.

Everyone agreed that people were generally doing 2-3M instance blocks. The tools can handle that in a reasonable amount of time. But NVIDIA at least would like to go to 10M, 20M and even 30M instance blocks. They feel there are a lot of ways they are losing from splitting the design up, having to do time budges, area losses. Patrick's estimate was 5% of area and 5% of performance is being lost, but they can't go to much bigger blocks since the run times get too long and because the memor requirements exceed the biggest servers. Huge blocks take 15 days to run, Anand agreed but said that QoR degrades on the bigger blocks too, not just that the runtimes become too long. Chuck (from Cadence) wondered how he even knew that the tools degraded if he could not try it, but Anand said that if they weren't getting anywhere they would divide a block up and then make progress. Antony said 2-3M instance blocks is where most of the industry sits today, in his experiene working with their partners.

Patrick (NVIDIA) said that some parts of the design process are more automated than before. Macro placement is a good example. We spent 20 years with macro placement not working all that well. But we don't know if it works so well on 20M instance blocks because the design exceeds what they can get on their largest servers.

Tom asked about DRC rule complexity. I was surprised to learn that DRC is not a big deal. There is always some turmoil on a new process node but then it settles down. Anthony (TI) said that blocks come out DRC clean even on advanced processes. Timing is just one ECO unless you are really pushing performance. The biggest challenge is IR drop, EM, and reliability. These used to be second tier issues in the past, but as you contnue scaling, wires, vias, and contacts don't get any better. So now the issues are not where EDA has been shining the brightest lights for the last couple of decades.

Tom asked about test insertion and DFT.

Anthony (TI) said that you still need to get the same quality goals for big digital designs for automotive as they used to get for small analog designs. Test point insertion is important, but most physical designers have no idea what he is talking about. However, you need to make sure the design is testable before physical design is done. You can't do it afterwards, that's an ECO mess.

"What about variation?" Tom asked.

Anand (Broadcom) said that one way to minimize it is in the design itself. In particular, make sure the clock tree looks very symmetrical: same layers, similar wire widths. The first goal is to achieve the latency and skew targets, of course, but symmetry helps minimize variation. Chuck (Cadence) said that we try and take advantage of thick metal and do a clock tree that is structured at the top and variable undernath, which also makes it much easier to parallelize clock tree insertion. Anthony (TI) agreed. If you get clock and power right, you've solved a lot of problems. One trick is to throw out library cells that are too sensitive. That way they can minimize the number of signoff scenarios. At 28nm, that is only about 8 if you do it right, which is surprisingly few. Chuck said that hopefully, you shouldn't have to remove them because the tool would naturally swap out the cells, but others on the panel reckoned that taking them out just guaranteed they would not be used and reduced the risk.

Tom asked about improving optimization in the tools for EM and noise.

Chuck (Cadence) as the EDA guy took the challenge.

When I think about optimization, it's likek this. Early on,in the process, the design is changing a lot, gradually the churn goes down as the design converges, meanwhile the accuracy of the analysis improves. The problem is that if you miss an effect until too late then it is hard to fix late. So we have pushed stuff higher into the flow, so we model the clock early in placement, and make synthesis physically aware. All these things are how we get the PPA wins and avoid problems later in the optimization. Otherwise, if it is too many violations we are just going to get stuck.

There was a discussion about cloud and compute farms. These are all big companies (TI, Broadcom, NVIDIA, Arm) who have their own farms. Everyone seemed to feel that it was more cost-effective than using the cloud. Even if you only needed 1000 CPUs for DRC, it is still more effective to buy them, Anthony (TI) said. That's assuming you have other things you can run on the ocres once that month or two is over, like SPICE simulations (ooh, he dared talk about analog in a big digital chip lunch).

Chuck (Cadence) got the final quesiton. "What keeps you up at night?"

I’m an R&D guy and we see what the customers give us. When R&D sees the block for things like AI chips and they break the memory, we fix the memory. We need to partner early and really push the envelope, work with the blocks and figure out how to solve them. You can’t write a placer for a 10M gate design without a 10M gate design. We can make them up but that’s only a stopgap. You have to give the test cases to us.

Sign up to get the weekly Breakfast Bytes email: