Paul Cunningham's DVCon Keynote: Verification Throughput = Engines × Logistics

10 Mar 2021 • 7 minute read

At DVCon 2021, the keynote was presented by Cadence's Paul Cunningham who is basically Cadence's Mr. Verification (officially he is Corporate VP and General Manager of System & Verification Group). He titled his presentation Computational Software for Intelligent System Design.

To show how little things have changed at the big picture level, Paul pointed out that Cadence has been a public company for over 30 years and pulled a paragraph out of Cadence's IPO filing. There's actually a story behind this that Paul didn't go into. This is actually from SDA's 1987 IPO filing. But the IPO never happened since the 1987 stock market crash happened on the same day. Instead, SDA merged with already public ECAD to create Cadence in 1988. So yes, Cadence has been a public company for over 30 years.

Today, Paul pointed out, nobody talks about automation anymore. It is just taken as a given, a fact of life. In some ways, verification is different from other EDA tasks in that it is an infinite problem. The number of cycles required for exhaustive verification is nearly the same order of magnitude as the age of the universe in seconds, so that is not going to happen. Instead, at some point, design groups make a judgment call that delaying tapeout would be riskier than going for it, they just say "enough is enough".

Paul talked about UPS (or anyone else like FedEx or Amazon) that Throughput = Engines × Logistics. In this case, the engines are planes and vans. An important aspect of these engines is that they are different. One is not better than the other, they have different purposes. UPS is not going to deliver to your home by plane, and it is not going to do overnight delivery from California to Florida with a van.

Paul continued:

There is a famous quote by General Bradley of the US Army that "Amateurs talk strategy; professionals talk logistics". And I think the same mindset is absolutely applicable to verification performance. It is not just about the engines but about how they are connected together to deliver. Which tests are run on which engines? Which coverage challenges on which engine? How manh engines do you need? All to fix as many bugs as possible within the schedule and budget.

So the equivalent to UPS's throughput is that Verification Throughput = Engines × Logistics. Cadence has four verification engines: JasperGold for formal, Xcelium for simulation, Palladium for emulation, and Protium for FPGA prototyping. Like UPS vehicles, they all have different purposes and tradeoffs.

One challenge with verification is that demand is almost infinite in terms of the volume of jobs, and as a result verification is always overutilizing the available compute farms. In fact, as a result, jobs spend a lot of their time waiting for a server to become available to run the job. The above graph shows in blue the total CPU time to run the jobs, and the red line shows the total queue time. As Paul said:

While many of you may be looking at this chart in disbelief, it is honestly something that we see often

Another interesting perspective is to look at performance versus server type, and then normalize that to the cost of buying the server, which gives throughput per dollar. There is clearly a massive spread, not just 10% or 20%, but an order of magnitude. When you look at verification throughput with the price tag attached, there are clearly huge opportunities and challenges around the logistics of verification throughput.

The heart of Cadence's Verification Logistics is vManager. It is "one engine to rule them all",

vManager offers our customers a complete enterprise solution for multi-project multi-site multi-engine verification management. It includes a high availability database architecture with redundancy and scalable proxy server interfaces. It can schedule jobs across all our engines, roll-up coverage and other job analytics across all our engines. The platform includes advanced web-based visual analytics and job controls and is fully programmable and extensible with a rich library of Python APIs

Paul went on to talk about Xceliium ML, which I've already covered in Xcelium ML: Black-Belt Verification Engineer in a Tool and Under the Hood of Xcelium ML. He also talked about machine learning in JasperGold, which I covered in Machine Learning in JasperGold and Jasper User Group: The State of Formal in 2020.

Paul talked briefly about the dynamic duo of Palladium and Protium:

Cadence has been a market leader in emulation for more than a decade with our flagship Palladium emulation system. Palladium uses a custom proprietary processor we design and tapeout here at Cadence. It’s the ultimate chip debug accelerator offering lightning fast predictable compiles and the ability to stop,start, and probe any signal at any time. But now Palladium has a system level partner, Protium. Protium is targeted at accelerating pre-silicon embedded software debug. Protium runs 3-5X faster than Palladium and like Palladium comes in a standard datacenter rack size, scalable to multi-billion gate designs. We’ve also design Protium from the ground up with a unified compile flow and unified transactors and hardware bridges to make it very easy to move a design from Palladium to Protium.

Where Palladium is our customers’ ultimate debug machine, Protium is their ultimate performance machine. And they can buy whatever mix of each achieves the highest verification throughput for their specific product needs.

Q&A

Paul's presentation had been pre-recorded, but it was followed by a live Q&A. I will pull out a few of the more interesting questions.

Q: What about Xcelium on Arm?

Yes, it is fully supported across the full flow, now including JasperGold. It is a long tail to get every app and every feature fully qualified, but the big engines are already there.

Q: It would be interesting if Xcelium ML could leverage ML cores on chips like M1...although obviously, that's not currently available in a data center context. Is that a possibility?

Absolutely. It is still early days in the ML journey. However actual training time is a small percentage of verification compute so optimizing training is not so relevant. Inference is also a small part, with most of the compute going to the simulation itself. So right now it is not a first-order term in the equation.

Q: What about taking hardware simulation to accelerate mixed-signal real-number models too?

That makes a lot of sense and real-number simulation can be accelerated in various ways. We spent a bunch of time and didn't implement the correct implementation. We are now reimplementing and hopefully getting it right this time.

Q: Do we need faster test bench development to quickly test updates?

Yeah, I need to think about that a little bit more. Save and restore is widely used as a throughput optimizer, especally directed tests at SoC level with a common init sequence. Save once and restore multiple times.

Q: What about startups and acquisitions?

I joined through acquisition, VIP came through Denali. Constrained random through Verisity, Palladium from IBM. I'm not sure if there is more impact today in startups than 5 or 10 years ago, it seems seems ongoing.

Q: What about ML for automated bug fixing?

There is nothing today, but it is an opportunity for wonderful things to come.

Q: ML for automated bug fixing

Not today but an opportunity for wonderful things to come.

Q: Does Xcelium use the GPU?

We needd to be open-minded and we look a lot at the GPU, but what we’ve found that with the underlying event queue problem there is massive random access. It is all load/store dominated and so the important thing is the size of L3 cache, not that actual burst load data in and then compute. So it is difficult to get performance per $. We’ve not been able to get there using GPUs. But they are becoming more varied, and so it is a moving target and we look continuously. So a future GPU might intersect with simulation, maybe. But other areas of EDA maybe a better fit.

Q: What about cloud compute?

Enabling EDA on the cloud is an important vector. About 3 years ago we really started emphasizing our commitment to cloud compute as a company. The transformational opportunity is that compute is infinite and infinitely homogeneous. You can get whatever machines you need at any time. Your needs day to day are not constant. Cloud makes it possible to statistically multiplex across the industry, versus each of us having our own private farms. We are sharing compute across the whole industry, and this creates the perception of infinitely elastic compute. Verification is the dominant consumer of compute. Cadence has a number of business models in play. Already, the full verification flow is cloud-enabled.

Q: How will ML change verification engineers' lives?

It will change everything. For us, our kids, and everyone.

And that seemed a good point for DVCon to wrap up Paul's session and thank him.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.