Xcelium ML: Black-Belt Verification Engineer in a Tool

12 Aug 2020 • 4 minute read

What if I told you I knew someone who could improve your regression efficiency: make fewer runs, spend less runtime on the runs you do make, and have the same coverage at the end?

You'd say that he or she sounds like a great verification engineer with years of experience. You'd want a resume, or even get an offer letter ready before your competition beats you to the punch.

Xcelium ML

As it happens, I do know somebody like that. But you don't need a resume. Today, Cadence announced Xcelium ML, or what is officially the Xcelium Parallel Logic Simulation with Machine Learning Technology.

Economists have a phrase "there ain't no such thing as a free lunch" that they abbreviate to TANSTAAFL. This captures the idea that there are always resource constraints and therefore tradeoffs. Even if you have a lot of money, you can't go out for dinner and go to see a new movie at the same time, so there is a tradeoff. (Well, right now you can't do either, you have to stay home, but that is just a different tradeoff.) Verification engineers could have a similar phrase that there is no such thing as a free vector. The amount of simulation you can do is always limited and you want to apply those vectors in the most efficient way possible on the design instances that make the most difference. Experienced verification engineers do some of that, consciously or unconsciously, but what simulations they select to run and, for constrained random, how long they run them for. Xcelium ML makes similar choices in a more structured way.

Like many deep learning approaches, there is a training phase and an inference phase. During the training phase, a large amount of data is used to calculate weights for a neural network. During the inference phase, the weights are used to operate on new, possibly related, set of data. The key aspect of this is that, even though the new set of data has not been seen before, it has enough in common with the old that the algorithm does better, often very much better, than random. So, for example, Netflix's recommendation engine can recommend movies that you will probably like even though you've presumably never seen them or rated them.

In the case of Xcelium ML, the learning phase consists of running a randomized test suite, and using the results for training. The inference phase is running the same test suite and using the model to generate control of the simulation. The results are impressive, delivering the same coverage up to five times faster.

This is a huge difference from an ROI standpoint. Back when I was still a practicing software engineer, I read a landmark book called The Art of Software Testing by Glenford Myers. This changed my perspective on testing by pointing out that a "successful" test is one that found a bug, and a test that ran uneventfully to completion was actually a waste of time.

That same perspective leads to the return-on-investment mindset in IC verification, looking at the metric of the number of bugs found per dollar per day. A key dimension of that, especially given that chips are not getting any smaller^{citation needed}, is a focus on verification throughput.

Machine learning accelerates regression throughput without impacting the results, which in this case is coverage. The same coverage five times faster should find the same bugs five times faster. Actually, since randomization obviously involves some randomness in the process, this is not precisely true. The diagram above shows Xcelium running a randomized test suite without machine learning at the top, and with machine learning at the bottom. In fact, the diagram isn't really close to scale since the lower run is only around 20% of the runtime of the upper run. Another little detail is that the machine learning engine runs on its own CPU (in red at the bottom) and controls what is going on based on results being achieved.

These graphs show the results. On the left is the runtime (so obviously smaller bars are better) measured in CPU teracycles. On the right is coverage measured in coverbins (where larger bars are better, but perhaps more importantly, getting the same bars is the most important).

Here's an actual case study. On the left the run without machine learning, taking 10,000 hours. On the right, Xcelium ML taking just 2,000 hours. It is not really rigorous to just eyeball graphs like this, but you can see that not only is the coverage similar looking, but the cover bins are filled earlier.

Summary

Xcelium ML dramatically improves randomized regressions using up to 5X fewer simulation cycles to achieve the same coverage
Natively integrated with Xcelium logic simulation
In early deployment with multiple customers, including Kioxia who are quoted as achieving a 4.5X reduction in turnaround time (Kioxia is the spun-out Toshiba memory division, in case you don't recognize the name.)

Learn More

Watch a video about Xcelium ML:

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.