Google FeedBurner is phasing out its RSS-to-email subscription service. While we are currently working on the implementation of a new system, you may experience an interruption in your email subscription service.
Please stay tuned for further communications.
Get email delivery of the Cadence blog featured here
I talked to Uri Tal last week, who has just joined Cadence as a result of the Rocketick acquisition. Prior to the acquisition, he was Rocketick's CEO. He gave me a little history. Rocketick started development eight years ago. They have a product called RocketSim that accelerates logic simulation. They started by using GPUs to do this, but then switched to multi-core CPUs. They can run on all the cores in a socket, in practice up to 32 today, although like a surfer they will ride that wave as the number of cores per socket increases. I call this Core's Law: the number of cores on a processor doubles every two years.
The speedups with RocketSim are impressive:
Another part of the value proposition is that you don't need to change any code, neither the RTL (or netlist) nor the testbench. As a result, RocketSim is in production use at many leading system and semiconductor companies. Intel Capital and NVIDIA were investors and NVIDIA has endorsed them (Intel doesn't endorse EDA suppliers).
The sweet spot seems to be large SoCs that are highly active, such as stress tests. The speedups are less with circuits that are not very active, which you would expect. No simulator needs to spend any time simulating inactive parts of a circuit, and it is obviously not possible to take less time than none. The focus of RocketSim is running complex simulations as fast as possible, so latency rather than throughput.
Under the hood, a simulation with Incisive and RocketSim runs the testbench on Incisive and the design simulation uses fine-grained parallelism to run on multiple threads. During compilation, the RocketSim engine's compiler runs on the host machine, separating the design from the testbench. During runtime, a large portion of the logic is offloaded to the RocketSim parallel simulation engine, running on standard multi-core servers, while the testbench runs on the host simulator. The PLI is used during simulation execution to maintain the state synchronization between the compiled-code simulator running the testbench and the RocketSim engine running the design.
Simulation is challenging to parallelize compared to something like a DRC since there is at least a global clock and potentially other high-activity signals that don't have well-behaved locality. When I was at Virtutech, we could run large distributed simulations across many servers, but that was because we "cheated". We were not truly running a single system, but a lot of separate systems connected by networks. When you have a network like Ethernet, and processors running at GHz rates, there is no concept of the exact clock cycle when a packet "should" arrive, so it lessens the need for perfect synchronization. An SoC is not really like that. The need for synchronization leads to two problems. It can consume a lot of the bandwidth of the cores just to maintain time. And if it is done wrong, there is a risk that causality will be violated.
Some key features of RocketSim are:
You can see RocketSim at DAC. They will be in the RocketSim booths 227 and 328 (which are almost in the same place) and in the Cadence booth 107. Experts from Rocketick will be in the Cadence booth at the expert bar on Tuesday from 2:30pm to 4:00pm, and on Wednesday from 1:00pm to 2:30pm. If you can't wait until DAC, take a look at this video demo.
Next: CDNLive: Routing at 10nm
Previous: Linley IoT Conference: Security and...Well, Just Security