Changing the Game with Processor Based Emulation

11 Oct 2012 • 7 minute read

I have always been fascinated by game changing moves. Some are more successful than others, but the general principle is always the same - coming with a gun to a knife fight. Two of my favorites are from sports. When I was a young rower, the moving outrigger was a game changer for a while and was a fascinating example of a game changing invention. The normally moving seat with fixed shoes and outrigger had been replaced by a fixed seat and moving outrigger and shoes. The advantages were proven scientifically and Peter Michael Kolbe became world champion in 1981 using the technology. The moving outrigger was banned from competition at the end of 1983.

The other cool example is that of Andy Granatelli who attended the 1967 Indianapolis 500 with an untraditional piece of equipment - a turbo engine which was much faster and had tremendous advantages over its reciprocating-engine rivals. Parnelli Jones drove the car in the lead for 171 laps before a simple bearing failure took it out of the competition. Granatelli had challenged the status quo by attacking the common notion of the underlying engineering architecture.

In our day to day lives in Silicon Valley high tech, this happens quite often. Clayton Christensen outlined this with the term "Disruptive Innovations" and you can pick your favorite here. In EDA I am lucky to be involved one of the most exciting markets right now - that of system development including hardware assisted verification. The latter enjoys some special recent attention ...

So what do rowing, the Indy 500 and Emulation have in common? Disruptive Innovation! FPGAs - which long had been at the core of emulation - are fundamentally not scaling well and the disruption is processor based emulation. Later in this post I will introduce three generations of emulation, and a couple of real life user reviews comparing how these measure up against user requirements can be found here and here.

So what is the disruption?

Earlier this year I was on a DVCon Panel called "Build or Buy: Which is the Best Practice for Hardware-assisted Verification?" What I remember most vividly to today is the commentary which John Goodenough from ARM, a fellow panelist, made repeatedly: "Well, everything is fine if your design fits into one FPGA. If it doesn't, then all bets are off!"

Let's take a step back and look at what hardware assisted verification is used for (I described more detail here in the context of system development). Before a chip is taped out it needs to be verified, and the tasks to do that can be categorized into four main use models - hardware verification, software testing, hardware/software integration and performance analysis/verification. The seven basic execution engines to run verification on - Software Development Kits (SDK), Virtual Prototypes, RTL Simulation, Acceleration, Emulation, FPGA Based Prototyping and the actual silicon in a Development Board - all have different strengths and weaknesses, like execution speed, bring-up time, HW debug, SW debug etc. Hardware assisted verification is really used in three forms during a project.

Here is what a typical flow looks like.

First, increased design complexity results in pure RTL Simulation becoming too slow even before RTL is maturing. Acceleration - the combination of RTL simulation and hardware acceleration - executes about 10x, 100x, sometimes 1,000x faster than pure RTL Simulation, but is generally limited to the KHz range. How applicable acceleration is to a project really depends on how fast a design can be brought up and how well hardware debug can be performed.

Second, emulation executes the full design in hardware, either with a synthesizable testbench or with connections to the system environment using rate adapters. Pure emulation gets the execution speed into the MHz range, fast enough to allow initial software testing. If RTL is still not fully mature yet, design teams have to trade off bring-up effort and bring-up time against the value of faster execution. Both hardware and software debug are important.

Third, for even faster execution, FPGA-based prototypes can get into the speed range of 10s of MHz, usable for verification regressions and software testing. Given the focus on speed, design teams are settling for fewer hardware debug features.

Going back to John Goodenough's panel comment, getting the design into multiple FPGAs is what the crux of disruptive innovation in emulation is about.

The more scientific way of looking at the problem is based on Rent's Rule. It predicts that a large FPGA will run out of pins long before all its available capacity is utilized (see graph below). Rent's Rule applies when a large design is pseudo-randomly partitioned into many smaller segments, and it statistically predicts the number of connections required based on the number of gates in the partition.

As a result, FPGA based prototypes need very special attention when it comes to partitioning and routing the design to be mapped into them, specifically because the FPGA capacity grows much more quickly than the available bandwidth between them. Time division multiplexing of wires seems to be the answer: several signals share the bandwidth of one wire and the less bandwidth that is required per signal, the more signals per wire can be used at the same execution speed. However, in reality small bandwidth per signal is hard to achieve in FPGAs because route delay unpredictability and timing constraints increase compilation time significantly. With unpredictable timing, capacity does not really scale and performance degrades quickly.

As a result users can find three generations of emulation:

First generation emulation relied (and relies today) on commercial FPGAs. Cadence actually used to have such a first generation emulation capability with the Axis Xtreme. It was focused on acceleration. But It turned out that the bring-up time - which is of crucial importance especially at the time when RTL is not mature yet, and multiple RTL revisions per day are made available - was too long to make it effective for acceleration.

Second generation emulation still did not make a disruptive change; it simply replaced commercial FPGAs with custom FPGAs. This allows incremental improvements for debug and routing difficulties, improving the time to bring-up somewhat. However, it did not make changes fundamental enough to overcome issues like Rent's Rule illustrated in the graph associated with this blog.

Third generation emulation - the disruptive change - happened about ten years ago when the Palladium team simply switched the fundamentals and replaced FPGAs with a multiprocessor engine, creating a processor based emulation system. The result is an engine with fixed bandwidth per signal, leading to scalability without performance degradation. On top of the FPGA related limitations in debugging the design mapped to them, the fundamental FPGA issues of long bring up time due to partitioning and routing difficulties have been addressed using static scheduling in a processor based verification computer - that's why we call it a Verification Computing Platform (VCP) today. Instead of long FPGA routing runs which can take 12 hours or more on multiple workstations, a compiler maps the design to Palladium at a performance of 50-70M gates per hour on a single workstation.

Besides other advantages like multi user access and high granularity, the results - as I noted in my recent Frankly Speaking Blog - are significantly faster bring-up times and better debug productivity, eventually resulting in the ability to do six or more design turns in a 24 hour day. This is important in those project phases in which RTL is not fully mature yet.

This all may sound bad for FPGA- based systems, but it really isn't. It is just important to emphasize when and for which use model they are applicable. They work at higher speeds that allow them to be connected to the design's environment, are ideal for software testing and hardware verification regressions. They are best used when the RTL is very mature and stable, and therefore hardware debug is no longer an important issue. And by the way, as part of the System Development Suite, we offer such a system as well -- it is called RPP, short for Rapid Prototyping Platform.

Bottom line, when choosing the right system for hardware acceleration, it is important for users to clearly understand their options between acceleration, emulation and FPGA based prototyping. Their choice needs to be driven by their use model and in which project phase they want to use it. Again, a couple of real life user reviews comparing how the three generations of emulation measure up against user requirements can be found here and here.

It is fun to be part of disruptive innovations in EDA!

Frank Schirrmeister

Subscriptions

Changing the Game with Processor Based Emulation