• Skip to main content
  • Skip to search
  • Skip to footer
Cadence Home
  • This search text may be transcribed, used, stored, or accessed by our third-party service providers per our Cookie Policy and Privacy Policy.

  1. Blogs
  2. Breakfast Bytes
  3. Putting a Rocket Under Incisive
Paul McLellan
Paul McLellan

Community Member

Blog Activity
Options
  • Subscribe by email
  • More
  • Cancel
SystemVerilog
Incisive
Verilog
rocketick
rocketsim
simulation

Putting a Rocket Under Incisive

22 Feb 2017 • 3 minute read

 breakfast bytes logoWhen Cadence first acquired RocketSim, I wrote a post, Omnia Simulation in Tres Partes Divisa Est, about how simulation was like Gaul, divided into three parts. The three parts were:

  1. Interpreted simulation (byte codes called p-code, an outgrowth of UCSD Pascal)
  2. The compiled code era, either through intermediate C or later natively compiled directly
  3. The parallel era just getting started

 Of course, Cadence has had products in all three generations:

  1. Verilog-XL, acquired from Gateway in 1999, almost 30 years ago
  2. NC-Verilog and Incisive
  3. Incisive + RocketSim

I also wrote in Phil Moorby and the History of Verilog about the early part of this history when Phil, the creator of Verilog, was inducted as a Computer History Museum fellow. He was also, as it happens, Cadence's first fellow. Phil showed his early notebook when they were brainstorming a name for the language. We all came closer than we knew to designing chips in Valigol.

Parallelizing a design rule checker is comparatively easy. To a first approximation, the chip can be divided into pieces and each of them checked in parallel. A polygon on one side of the chip cannot cause an error on the other side of the chip. Simulation is not like that. A signal transition on one side of the chip can affect the simulation anywhere else on the chip. If one part of the simulation runs ahead of the other parts, it risks violating causality when a signal from the future gets sent back into a part of the chip still running in the past. There are some tricks that can be played with speculative execution and backing up the simulation, which is what makes mixed analog/digital simulation work, but that approach is limited. Ultimately, all parts of the simulation must agree on the time, and a constantly changing global value like that is a killer for parallelization because the cost of communicating that global value can overwhelm the gains from parallelization.

Early attempts at parallel simulation technology for designs focused on separating hierarchical structures. Each sub-hierarchy was then assigned to a processor and the simulation was connected through interprocess communication. These attempts uniformly failed to fulfill the promise of parallel simulation. Scalability was limited to two to four cores yielding less than 2X improvement.

Successful parallel simulation must account for multiple cores, clocking domains, complex interconnect fabrics, hundreds of IP cores, and many other components of a design. Developing this capability takes a lot of hard work and ingenuity to analyze the billions of computing elements and the dependencies among them in order to identify the short/simple calculations that can be parallelized and distributed across multiple cores on a server. Focusing on the language level and identifying dependencies among independent threads of execution has led to the creation of a parallel simulation engine combining the existing Incisive simulation technology with the RocketSim technology acquired last year. The parallel simulator delivers up to 10X speed improvement, depending on the design style and the type of simulation:

  • 3X on average for SystemVerilog RTL simulation
  • 5X on average for gate-level simulation
  • 10X on average for gate-level DFT fault simulation

The simulator runs on standard multi-core servers (up to 64 cores). The RocketSim engine separates the simulation into portions that can and cannot be accelerated. The parts that can be accelerated, such as the gate-level or SystemVerilog RTL portion, are placed on the parallel engine. There is no need to change the testbench, the design, or the assertions to use the technology. The diagram below shows how the dependencies in the simulation are used to divide the simulation up into pieces that can run on the different cores, and then the Rocketick virtual machine is run on the underlying multi-core physical hardware.

 Everyone knows that design size will continue to drive requirements for more speed and more capacity. There is simply never enough time in the schedule to perform all of the simulations. Simulation is never "finished." Simulation technologies will obviously have to evolve over time, just as they have over the last 30+ years since RTL simulation was invented. The RocketSim engine is the first step in this new, more parallel, future. At last year's design automation conference (DAC), Adam Sherer agreed that John Cooley had the Cadence roadmap largely correct. It isn't rocket science to predict that Cadence will integrate the technologies more tightly for even greater speedup, rather than operating through relatively slow APIs.

There is a white paper available, Massive SoC Designs Open Doors to New Era in Simulation.