Protium: Next Generation FPGA Prototyping

28 Feb 2017 • 5 minute read

FPGA prototyping is a very attractive tool for some aspects of verification. Apart from actually getting silicon back from manufacturing, it is the fastest model of a chip that you can get. It is really only appropriate very late in the design cycle, once the design is reasonably stable and complete. The main reason for this is the "well...duh" fact that FPGA prototyping requires synthesizable RTL to be available for the whole SoC (or the parts that are being prototyped, that is).

The other reason is the dirty secret of FPGA prototyping, that no matter how nice the words look on the Powerpoint, historically bringing up an FPGA prototype has been very time-consuming, with bringup measured in months. A lot of manual changes were required to the RTL, manually partitioning among multiple arrays, altering the clock architecture to comply with limitations, and manually modeling memories (which don't map well onto the underlying FPGAs).

In practice, FPGA prototyping is most attractive for software development once the chip design is close to complete. It can be used for some extensive testing of system level interaction between components such as multicore microprocessors. But earlier in the design cycle, when the RTL is not stable, it isn't so effective.

Another thing that is predictable is that every couple of years, the requirements in terms of performance and capacity requirements will increase. Luckily Xilinx is on the case, and FPGA prototyping systems can piggyback on their developments. Yesterday, I wrote Xcelium: Parallel Simulation for the Next Decade about the new simulation engine. Also announced that day was the next-generation Protium S1 FPGA-based prototyping system. One aspect is that there is new hardware to go along with it, with 6X the capacity and 2X the performance of the previous generation. The hardware is based on Xilinx Virtex Ultrascale FPGA technology.

Reduced Bringup Time

Design bringup time is reduced by an average of 80%, typically 1-2 weeks as opposed to months. There are a number of reasons for this reduction, perhaps the most important being advanced clock handling and automated memory support.

The advanced clock handling means that a design is not limited by the clocking restrictions of the underlying FPGA architecture. In particular, there can be an unlimited number of design clocks without any hold violations. The timing closure of the FPGA compile process is also improved.

There are several levels of advanced memory support, depending mostly on the size of memory required:

Smaller memories are automatically compiled into FPGA-internal resources, with a fully automatic compile, and running at full design speed
128MB static memory cards (XSRAM) can be used, with a fully automatic compile, and running at full design speed
8/16GB dynamic memory cards (XDRAM) can be used at up to 6MHz, with a semi-automatic compile
Direct connect memory card (DCMC) can be used at full design speed although some design changes may be required
Custom memory cards (FCMC) can be used at full speed, although obviously requiring a custom development

Congruent with Palladium

Protium S1 is fully congruent with Palladium Z1, in the sense that it takes the same RTL input and has the same design behavior. Also, like Palladium, Protium is designed to be an enterprise resource, accessed over the network and usable from anywhere. It is not necessary to be in the same room as historically has been the case with FPGA prototyping.

The two technologies are complementary. Palladium Z1 has the best debug (all signals are visible without deciding in advance which ones you want to observe), and there are rich facilities for power and performance analysis. It is intended for both SoC hardware development and software development. On the other hand, Protium S1 has even higher performance and is the ideal solution for software development and for running hardware and software regressions. The same SpeedBridge adapters can be used with both systems to connect the modeled SoC to the real world through interfaces like PCIe.

One flow that some Cadence customers use is to run regressions on Protium S1. If a hardware issue is detected, then instead of trying to debug that issue on Protium, the design is moved to Palladium, where the more powerful debug facilities can be used to discover the root cause quickly.

Debugging

There are many features for hardware and software debug. The basics are capturing pre-defined signals for later offline waveform viewing, including a JTAG debugger interface and a UART interface.

More advanced features:

Waveforms across partitions: design-centric rather than FPGA-centric
Force/release signal: preselected signals can be forced to 0 or 1 during runtime
Real-time monitoring of preselected signals
Backdoor memory access: snooping into memory without going through the memory interface, or uploading values
Clock control to start/stop clock
Support for the Standard Co-Emulation Modeling Interface (SCE-MI)

Scalable Performance

It is possible with Protium to get still more performance than the fully automatic process described already can produce. The fully automated process typically brings up a system in 1-2 weeks. In about twice that, an additional effort can be put into optimizing critical paths, using high-effort place & route, optimizing memory ports and so on. Focusing on the bottlenecks and eliminating them can drive the performance up a lot (this is very design-dependent) with a small amount of effort.

All of the tricks that have traditionally been used in FPGA bringup can be used, too. One feature is that a block can be hand-optimized in some way, and then Protium can "black box" it, taking the optimized version and using it while automatically processing the rest of the design. But making major changes to the RTL, such as re-partitioning, simplifying the clocks, pulling out unused test logic and so on, can be used. The turnaround time to do this is generally about 3 months—the traditional FPGA bringup time—but for many designs, the potential performance improvements are large.

Summary

In summary, the Protium S1 FPGA-based Prototyping platform is part of the Cadence Verification Suite that cuts across multiple technologies: formal, simulation and emulation, as well as FPGA prototyping. Design bringup is reduced by an average of 80%. The system is congruent with Palladium Z1, taking the same RTL and exhibiting the same behavior, and using the same SpeedBridge peripherals. There is a rich portfolio of advanced debugging facilities. The capacity is 200 million gates of 600 million gates.

Pop quiz: Is Protium really a word? And what is it?

You've probably heard of deuterium and tritium. Alternatively, you've probably heard the phrase "heavy water", even if you are not sure what it is. In fact, heavy water, used in some nuclear reactors, is water with deuterium atoms for the hydrogen. That has a nucleus of a single proton (or else it would be something other than hydrogen) and a neutron too. Tritium has two neutrons. Light hydrogen, where the nucleus has no neutrons and just its proton, is also called protium when it is important to distinguish it from other isotopes of hydrogen. It also starts with the same letters as "prototype", which is why it was picked as a name for this product.