Jasper User Group: The State of Formal in 2020

29 Oct 2020 • 6 minute read

Last week was the CadenceCONNECT: Jasper User Group conference. Of course, it was a digital event. Normally, the number of attendees is limited by the size of the Cadence building 10 auditorium, but this year there was no physical limit and over 600 people attended. That's about three times as many people as in recent years. As it happens, I've been attending the Jasper User Group meeting for years. Before I rejoined Cadence almost exactly five years ago, Jasper was one of my clients at Semiwiki, so I would attend and write about some of the presentations. I've been doing it every year since. One year had challenging logistics since it was CadenceLIVE Israel on Monday, meaning getting a flight back to California late that night, arriving first thing in the morning, and then driving down to the Cadence campus. In fact, there was enough time that I could even grab some breakfast.

This year's meeting started with a session with two presenters. Ziyad Hanna presented Formal Verification in a Distributed World. Then Richard Paw of Amazon Web Services (AWS) presented JasperGold on AWS Cloud.

Ziyad Hanna

The biggest problem in verification, whether using formal techniques or simulation-based techniques, is the sheer scale of the problem, especially for billion gate designs. It takes 10¹³ simulation cycles to verify a CPU, 10¹⁵ (100 times as much) to verify a GPU, and the same again to develop the software on an FPGA prototyping system like the Protium X1 platform. It is only another factor of 100 to get to the number of seconds since the big bang and the creation of the universe.

Formal adoption has been growing fast. Hyperscale computing, 5G, and IoT edge have been increasing adoption at a year-on-year growth of 15-20%. These are already large markets, of course. Meanwhile, AI and security have been increasing adoption 100-600%, although from a much smaller base. Overall, that has led to a CAGR for the JasperGold platform of 16% over the last four years.

Over the last few years, there has been an explosion in the size of hyperscale data centers, some on-premises, but mostly in the cloud. The challenge is to distribute development across lots of servers. See my recent posts Computational Logistics, vManager: One Manager to Rule Them All, and System VIP: Logistics for Cache-Coherent Multiprocessor Systems for a look at how to orchestrate verification at scale.

However, it is not just the logistics of verification. Cadence has made a heavy investment in the core technologies under the hood of JasperGold. There are improved formal models and simplifications at the design level. We have had some breakthroughs in combinational solvers (which are hard to scale) and new SAT solvers. One breakthrough is the M Symbolic Simulator.

One particular area is Smart Proof Technology, using both smarter data management (such as reusing existing results if the logic cone is unchanged), using machlne learning to select the best macro-level settings (sometimes called ML-outside), and using machine learning to set the best algorithm (sometimes called ML-inside). This results in finding more bugs faster, and with better convergence.

How fast your formal test suite runs is dependent primarily on four things:

How well Cadence improves the core algorithms
How well you orchestrate your verification strategy
How many licenses you have access to
How much hardware you have access to

A couple of years ago, Cadence made a big push to support massively parallel EDA tools running in the cloud. Today, there are over 100 customers already using Cadence Cloud. For my most recent post on Cadence Cloud, see my posts CloudBurst: The Best of Both Worlds and Barefoot in a CloudBurst: Tempus on 2000+ CPUs. Plus you can read the rest of this post to find out what AWS's Richard Paw had to say.

Ziyad wrapped up with the State of the Union in a Distributed World:

The verification task continues to grow fast
Formal's role in the verification mix is growing even faster
We're developing multi-product solutions including formal to address the growing challenges of an increasingly distributed verification world
We're seeing good ROI for cloud to provide scalable distributed compute resources

Richard Paw of AWS

Richard started by pointing out that Amazon designs a lot of their own silicon devices and it "eats its own dogfood" and uses AWS for its own IC development. For a deeper dive, see my post Climbing Annapurna to the Clouds (Annapurna is a company that Amazon acquired a few years ago that is the main chip design organization).

Cadence and AWS had done an exercise called JAWS, for the JasperGold platform on AWS. The goals were twofold. Firstly to provide an easy path for JasperGold customers to run on the AWS cloud, by using a realistically complex design and not just a toy example. Second, to gather some performance details by running on a number of different configurations with different degrees of parallelism: on-premises at Cadence with 12 cores, cloud with 96 cores, and cloud with 960 cores.

The design was a new Tensilica Xtensa processor. This adds a new secure/non-secure mode to the processor. There were 343 properties verifying the security aspects of the processor. See the block diagram above. This example was picked since it doesn't involve any third-party IP.

Richard summed up the results:

The table on the left shows the results for each iteration step, and the graph on the right sums the time to complete the iterations. We ran three iterations on all three compute resource configurations—each iteration increased the time allowed per property. The 96 core configuration reduced the time for the same results by 5.5X. The 960 core configuration reduced the time by 20X. We ran the fourth iteration on the 960-core configuration only, due to time constraints. This iteration found new useful proofs, including 23 with cycle bounds large enough to be valuable, and importantly, found four deep security bugs.

Note that last sentence: Even though this was primarily an exercise in bringing up the JasperGold platform on AWS and measuring the performance, several deep bugs were found.

Because of the way formal works, where some proofs turn out to be easy, and others require a lot of deep analysis, the recommended approach based on this exercise is to use the parallelism of the cloud to prove difficult properties and find difficult bugs that you would miss on in-house resources.

For this particular example, a good approach is:

Run iterations 1 and 2 (easy properties) on-premises for nine hours
Run iteration 3 with 96 cores on AWS cloud for six hours
Run iteration 4 with 960 cores on AWS cloud for six hours
Use EC2 spot instances (if you don't know what that means, see my post EDA in the Cloud: Astera Labs, AWS, Arm, and Cadence Report).

The bottom line was that increasing the number of cores by 8X (from 12 to 96) gave a 5.5X reduction in time to results. Increasing by 80X (from 12 to 960) gave a 20X reduction in time to results.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.