Protium: FPGA Prototyping the Cadence Way

30 Nov 2016 • 4 minute read

I attended a recent workshop on Protium titled Accelerating Embedded Software Development: Protium—Ease of Use in FPGA-Based Prototyping. By the way, Protium really is a word, used in chemistry and physics when they need to specify the isotope of hydrogen with a single proton (as opposed to deuterium or tritium, which have one or two neutrons, too).

FPGA Prototyping

The pie chart to the left shows the biggest challenges to using FPGA prototyping.

But, in fact, everyone knows the pros and cons of FPGA prototyping. The big pro is that, short of getting silicon back from manufacturing, it is the fastest possible implementation of your SoC. As such, it is the one that the software developers would most like to use since they are very sensitive to performance. After all, much of what they run involves booting an operating system such as Android or Linux, which is, literally, billions of vectors. Even emulation is too slow and it is an expensive solution when the software team on a big SoC often outnumbers the chip development team.

But there is a big drawback. Everyone also knows that it takes months, again literally, to bring up an FPGA prototype. There will be constructs in the RTL that either cannot be directly implemented in an FPGA, the most obvious being memory. If the design is large, then there is the additional challenge of partitioning it across several FPGA devices. Since that means going off one device, across a board, and back onto the next device, the signal delay is going to be a lot longer than if it was just driving a nearby gate on the same array.

A side effect of the bringup being difficult is that you only want to use FPGA prototyping late in the design cycle when the RTL is almost stable. Small changes are easy to accommodate, such as adding a few gates, but a major change can mean going back far enough that there will be several weeks delay again. FPGA prototyping is not the medium of choice for debugging the design either, since every change in which signals you want to watch requires a recompile. Debug is much better done on an emulator, such as the Palladium Z1, where all signals are observable, even the ones you didn't think you needed to watch.

Of course FPGA prototyping is great for the software team since they don't need to look at signals beyond a few critical registers. They just want a stable implementation of the system. In fact, if there is one thing that the software team detests, it is getting heavily involved in hardware debugging. It does nothing, at least directly, to their goals.

Reducing TTP

So how can we reduce the bringup time, or time to prototype (TTP)?

The Cadence solution is to use Protium. That's it in the above photo with its side panel off. The are options, but basically Protium consists of two pieces, a hardware box that contains the FPGAs, potentially along with daughtercards. And a compiler that reads in the RTL and prepares the bitstream, the code that programs the FPGAs. The great thing about this flow is that it is over 80% the same code as is used to prepare a design for Palladium. A design that runs on Palladium should run directly on Protium without having to make any changes. The software takes care of:

Complex clocking structures
Memory compilation and modeling
Multi-FPGA partitioning

I am not going to pretend that this approach will automatically give you as optimal a result as you will get by putting several months of dedicated work into doing it yourself. But it is automatic and is usually within a factor of 2 or 3 of that optimum solution. You can add daughtercards for memory and you have access to a SpeedBridge interface to connect up your FPGA prototype to the real world.

If you want to put some effort into the compilation, then there are a few things you can do:

Provide additional manual guidance to the partitioning, based on your design knowledge
"Black box" certain blocks by providing an optimized solution that the compiler can just drop in. This is typically used for high-speed interfaces. The rest of the design is processed automatically

So you really get four options with Protium:

Fully automatic: 3-10MHz
Black boxing: some blocks 100MHz (often used for high speed interfaces)
Manual guidance: 10-20MHz (requires design knowledge)
Fully manual: 3-100MHz (standard Xilinx Vivado flow—Protium contains Xilinx Virtex-7 chips)

Protium can be accessed remotely. There is no requirement for it to be in the same country as you. As I said above, the FPGAs in Protium are Virtex-7, but in the next generation they will be UltraScale which will also allow PCIe 3 support.

Using Protium with Palladium

If you think about it, Palladium and Protium are complementary. Palladium is slow (at least compared to Protium, it is fast compared to simulation, of course) but it has full visibility for debug. Protium is fast but it is a pain for debugging the hardware. Adding a new probe might take 12 hours or more. So the ideal flow would combine both, using Protium for speed but when problems are encountered, not trying to use Protium itself to fix them but transferring the design to Palladium. At CDNLive earlier this year, MicroSemi described such a flow and I wrote an article: Palladium and Protium Platforms: the Hardware Twins.

Summary in One Diagram

Next: The Internet of Space

Previous: Portable Stimulus Standard