Decoding Formal Club: Arm and Arteris

27 Oct 2017 • 8 minute read

At the latest meeting of the Decoding Formal Club, organized by Oski and sponsored by Cadence, the agenda promised three things:

Lessons Learnt from a Deployment of Sign-Off Formal on a High-Performance Arm CPU, by Vikram Khosa, formal verification lead at Arm (I promised you this week's posts would be Arm-powered)
Architectural Formal Verification of Cache Coherent Protocols, by Chirag Gandhi, senior hardware verification manager at Arteris and Deepa Sahchari, formal verification engineer at Oski
From Flapping Birds to Space Telescopes: The Art and Science of Origami, by Dr. Robert J. Lang, pioneer in computational origami

Of course, the first two sounded pretty interesting in their own right, and they are the topic of this post. I already covered the Computational Origami presentation, which was indeed as interesting as it sounds.

The first surprise I got was the Rob van Blommestein is now the VP marketing at Oski. I think I first met him when he was VP marketing at Springsoft and he signed them up with Semiwiki, and I was the designated blogger. He actually holds a sort of Semiwiki record, since he left Springsoft to go to Jasper and signed them with Semiwiki, and I was the blogger. Then, when Cadence acquired Jasper, he went to S2C and signed them up with Semiwiki...but by then I was at Cadence myself. He says he plans to sign up Oski. There maybe some individual who has signed up two companies, but for sure he will be the only person to sign up four.

Signoff of Formal Deployment on an A-Class Arm CPU Family

Vikram Khosa leads Arm's CPU formal verification in Austin. He presented on Arm's experience using formal. They did some initial projects in the 2013-Q3 2014 timeframe. If you want to read even earlier experiences of Arm with formal approaches, see my (semiwiki) blog post Formal Verification at ARM taken from the 2012 Jasper User Group meeting. (Speaking of, this year's meeting is coming up on November 7 and 8. I'll put a little more detail at the end of this post.) Vikram covered a lot of ground. I will focus on the management aspects, rather than the details of how they used JasperGold.

Vikram started with a little history. Until Q4-2014, Arm did various pilot projects without really making it a formal(!) part of their verification methodology. But they decided to change that with the new Cortex-A CPU (number not yet announced, and if you understand Arm's numbering system then you are smarter then I am...or Arm's CEO who admitted to me that he didn't understand it either). They had a small, dedicated formal team. They hired Oski to help them. They had something that I've heard other engineers in other companies say is critical: strong management support.

One big change was that "formal" was added to signoff goals for several units of the CPU. That meant that it wasn't just used by some designers who felt like using it. It was planned and tracked rigorously, including end-to-end (E2E) checkers. They analyzed proof depth for signoff, tracked and analyzed coverage, and generally made formal verification a first-class citizen up there with simulaton.

They picked targeted units spanning the whole CPU: instruction fetch (predict, fetch), core (rename, commit, issue, floating point datapath), memory system (load/store, L2 cache). There were also a couple of other units focused for formal but not for signoff: instruction decode, and the MMU. These different blocks are color coded in the above diagram. During planning, they ended up with detailed formal test plans, based on the Oski methodology, for IF and L2, reviewed by all of the various groups (designers, formal engineers, simulation engineers). There were less detailed plans for LS and the core. There was minimal upfront planning for ID and MMU, these were basically designer-owned testbenches.

One critical factor was getting designers engaged. Some of this was training, some was assistance with writing properties and debugging traces. The designers varied in how they responded to this. The majority were "medium" users, with a rich set of embedded properties, some interface properties and reactive CEX-debug. The minority of heavy users went further with active usage of formal testbenches for bring up and debug. The light users were stragglers who "got away with as little as they could."

Arm discovered that soon after RTL-feature-complete they hit "the valley of death: most constraints, checkers, and abstractions in place, but not all deep abstractions implemented. But they started to run out of new bugs...and meanwhile simulation was easily going deeper. So they invested in bug-hunting to bridge the gap, since bug-hunting engines can easily explore complex properties. This yielded a steady crop of new CEX. The timing of this is important. You want to hit the Goldilocks zone for the really deep, complex abstractions: too early and RTL instability makes formal investment costly, too late and there are not enough bugs left to be found.

How did it turn out?

They found about 300 bugs, of which two-thirds are pre-alpha, and a slow, steady trickle through beta and beyond
Formal gained mindshare and acceptance within the design team (with some deeper pockets of engagement)
They built a capable formal team

Looking to the future, Vikram said that the key lessons learned were:

Follow the bugs, and focus resources on the most fertile areas of the design, at the right level of the hierarchy
Rigorous planning
Sufficient (and sustained) focus on E2E checkers
Early investment in tools and flows

His final conclusions: formal performance and adoption have come a long way but nowhere done, there is lots of room for growth in capability, and lots of room for more dissemination, both training and organizational. Scaling requires progress on several dimensions, as shown in the diagram below.

Architectural Formal Verification of Arteris Cache Coherent Protocols

Chirag Gandhi of ArterisIP and Deepa Sahchari of Oski presented on verifying cache-coherent protocols. This is, in some ways, a perfect area for formal approaches. You really want to prove that the correct data value always gets used in a cache-coherent protocol, and it is pretty obvious that simulation might miss some obscure corner case. But obscure corner cases are the bread-and-butter of formal. Another issue is that coherency and other properties (like absence of deadlock) have to be verified at the system level, whereas simulation (and its cousins like emulation) only verify a particular implementation.

If you need a backgrounder on cache coherence, then see my post Cache Coherency Is the New Normal.

The architectural formal verification methodology is a three-step process:

Create architectural models for the blocks
Formal verification of system of block models
Validate the block models against the RTL

It is perhaps worth emphasizing this flow. It doesn't use formal to verify the actual implementation directly. Instead, special models are created that model the block-level contract to the system, without all the details, and without worrying about resource constraints. These can then be verified at this system level as in the diagram below.

The implementation is not formally verified directly, instead the RTL implementation is verified against the block models to ensure that the RTL meets the contract for the block. This can be done one block at a time, without ever requiring the entire RTL to be verified as a huge blob.

Chirag and Deepa went into a lot of detail, but as with Arm above, I'll keep it at a high level.

The system-level properties they wanted to prove were:

Cache-coherency (things like that if one agent has a cacheline in "unique" state then the others have it in "invalid" state, or no two agents can both have the same cacheline in "dirty" state)
Arm ACE interface protocol compliance
Data integrity (such as dirty data is never dropped)
Absence of deadlock (all transactions will complete in a finite number of cycles)

So did they find any bugs? In some ways, at this point, you want the answer to be that the network-on-chip was perfectly designed and there were no problems. On the other hand, you want the formal approaches to show their value by finding some deep bugs that would have been impossible (or unlikely) to find through simulation. So yes, they found some bugs, such as the one in the waveforms below that show a master in "unique" state getting a "shared" response.

Two other bugs found were returning dirty data to an ACE master holding data in "unique" state, and loss of dirty data when writeback from an ACE master is closely followed by a read request from a different master.

Even with this abstracted approach, there are scenarios that need more that 36 cycles to reach valid states. But the sequential depth to attain all state combinations is very high. So an even more abstract model is required to handle this complexity. It is actually a superset (not a subset) of the design behavior, with added transitions and reset states that have the effect of reducing the proof depth without masking any potential errors.

Final results were:

The modeling process uncovered specification issues (resulting in adding a new snoop type)
They found protocol-level bugs related to coherency and to interface protocol violations
Even though the RTL was simulated clean before formal, still formal found additional RTL bugs

Jasper User Group

It's that time of year again, formal verification fans. The Jasper User Group Conference is November 7 and 8 on the Cadence campus. Get all the details, including the full agenda and a link for registration. I will see you there (despite also attending CDNLive Israel the day before—yes, it's possible).

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.