Earlier this week, Cadence announced the Jasper C2RTL App, which I covered in my post Announcing Jasper C2RTL App: Formal for Algorithmic Designs. At the Jasper User Group, Disha Puri presented *Datapath Formal Verification 101: Technology + Technique*, which covered both the theory and practice of datapath formal verification (FV). Two uses cases she went deeper into were dot-product accumulate systolic (DPAS) design, and hashing design.

### Datapath and Control Path

Designs are divided into two broad categories, datapath and control path. Any algorithm that involves data manipulation or data transformation falls in the category of datapath, and anything that has to do with data transfer, such as FIFOs or muxes, falls into the category of control path. In graphics especially, there are a lot of deep-rooted complex algorithms such as compression, arithmetic design in processor execution units such as floating-point, hashing, ray-tracing, render2texture mapping, tiling, and many more. These are not new algorithms, but they are complex and optimized to deal with different microarchitectures.

So the obvious first question is "Is simulation enough?" The example she used is that you have been tasked with designing a double-precisions floating-point FMA (fused multiple add...something used in neural networks a lot). This needs to accommodate changes over generations based on customer needs, and the same design must also work with different microarchitectures such as mobile, laptop, servers, and so on. Obviously, since it is 64-bit floating-point it has a huge state space. Since this is a presentation at a formal verification user group, it is no surprise that Disha's answer is "no." Simulation is definitely not enough.

### Datapath Formal—Theory and Practice

So we need to have an understanding of how formal can step in. Disha's group has been doing datapath formal for quite some time now. She said she would start with the basics and then move on to a couple of interesting case studies.

Of course, formal verification is proof that a system adheres to a specification. The system can be a piece of RTL...but also a software program, a netlist, a security protocol, or a high-level model of hardware. And the specification can be assertions, something like the IEEE floating-point spec, deadlock-freeness, cache-coherence, and so on. A formal verification proof amounts to exhaustive simulation, meaning that for all possible inputs you have validated that an assertion is always true.

Traditionally, datapath FV is done as C to RTL equivalence. We say that the inputs to the C++ and RTL will always be the same, and FV has to answer the question of whether the outputs will always be the same in all possible scenarios.

The reason that we use C++ is that it is a high abstraction level and we can trust C++ more. For example, taking the same example of writing a FMA design, the RTL has a lot of hardware constraints and integration parameters to make sure that you have an optimized design. However, in C++ you can simply write A*B+C using any floating-point library and you would have your design. And if we have proved that C++ gives the same output as the RTL for all possible input scenarios, then you can safely say that your RTL is, in fact, correct.

However, there are three nuances that need to be dealt with in practice. The three Cs:

- Compilation
- Correlation of inputs and outputs
- Convergence.

### Case Study 1: Dot Product Accumulate Systolic (DPAS)

This has multiple stages and allows various data formats. These sorts of designs are very popular in machine-learning applications.

There are three inputs A, B, and C. The algorithm is to split A and B and calculate A1*B1 + A2*B2 + C. The C for the first stage is coming from outside, but for all subsequent stages, the C is coming from the previous stage. These are not standard IEEE formats since there are custom precisions involved. At each stage there are complications from floating-point features like NaN (not-a-number), infinities, and so on. The architect gives the specification to the C++ designer and the RTL designer. They work in silos and come up with designs optimized for their own purposes.

But when the designs are given to the FV designer, there may be issues. Not everything is formal-friendly. For example, if we want to test exhaustively, the limits of the inputs must be known to us. For example, we can't give an unbounded loop to a tool. If it is not determined at compile time then there must be some way to determine it at run time and state that there is, indeed, a limit.

In fact, as vendors are improving and consuming as much C++ as possible, we actually had pretty good experience in consuming the C++ model in these two case studies. I will be talking about the second case-study where we were able to consume the C++ model without any problems.

Their practice is to go back to the C++ engineers and have them make the code formal-friendly. In this case study, the C++ was about 1000 lines of code. It is possible that the C++ to RTL can pass, but the intention with which the algorithm has been written may not be following through in the C++ or even in the RTL.

So when we are directly dealing with the architects, we request them to add as many C++ assertions as possible. And this design, he added more than 50 C++ assertions...then at runtime it is possible to prove whether that assertion holds true or not. And because of that we were able to catch multiple overflow and underflow conditions that were not being honored by the C++. The architect was able to fix his model to adhere to those expectations.

The next step is how to map the C++ and the RTL together since they may not have the same interface mappings. For example, a designer working on compression/decompression might say that anything that comes out of compression is a valid input to decompression. But that is not good enough for FV since the inputs to the decompression block need to be constrained in isolation so that the decompression block can be verified.

Finding bugs turns out to be the easiest step once all these fixes to the inputs have been done, and, of course, it has the biggest return on investment.

The next step is convergence. Once all the potential problems in the design have been found, the challenge is to converge on the design to get 100% confidence. For example, in this design it makes sense to isolate every stage, and then connect them together. That is a sort of divide and conquer.

At this point they took all the constraints added for FV and added them back as SystemVerilog Assertions (SVA). This found 3 bugs where issues had been masked but were real issues. In the first generation of the design, the formal team can go back to the designers. But for future generations, you want to be able to do regressions automatically. By adding constraints as SVA, if something changes then it will break in simulation and everyone is aware that something needs to get fixed in FV (where the SVA came from).

Above is a summary of the design flow for C2RTL formal verification of this design.

### Case Study 2: Hashing

This is a much simpler design, and Disha said she was using it to show how much the tools have improved over the years. They were able to compile the C++ for the design without any manual intervention in a matter of minutes. Kudos to the compilation part of the process.

Since the design has a latency of 600, and there are 32 bits for C++, and there are 1024 assertions, the formal analysis is deep and complex. But all assertions were proved correct in 48 hours. Kudos to the formal engines part of the process.

### Key Takeaways

- Complex datapaths cannot be covered by simulation alone
- Identify the right block
- The 3 Cs of formal verification: Compilation, Correlation, Convergence
- Human interactions play an important role
- Implement the complete process flow to get complete confidence
- Bugs lie in adjacencies as well

**Sign up for Sunday Brunch, the weekly Breakfast Bytes email.**