Lightelligence Accelerates DFT Simulation Using Cadence Xcelium Multi-Core

17 Apr 2023 • 3 minute read

The growing design complexity of today’s system on chips can result in long hours, days, and even weeks of design for testability (DFT) simulations. As these tend to occur at the end of an application-specific integrated circuit (ASIC) project, when an engineering change order (ECOs) forces a rerun of these long DFT simulations, it can impact the tapeout schedule by days and weeks. These DFT simulations have very long runtimes because they typically have high event density (events per simulation time), and traditional event simulators slow down as the event density increases. A new approach is needed.

Lightelligence is a technology company specializing in artificial intelligence and optical computing. The company focuses on developing novel technologies for machine learning, computer vision, and data analysis that leverage the unique properties of light to power the next generation of innovations. Lightelligence is the only company worldwide to demonstrate a working and fully integrated optical computing system.

Lightelligence aims to enable high-speed, low-power, and energy-efficient AI solutions for various applications. Cadence’s Xcelium multi-core GLS simulation—among other Cadence products and solutions—has been vital to Lightelligence in building its technology. The Lightelligence team approached Cadence to speed up an exceptionally long DFT simulation. Long latency simulations were impacting their tapeout milestone. This blog explains how efficient gate-level simulation methodology using Xcelium helped Lightelligence address long latency, reducing their regression time and thereby improving performance.

Both Low- and High-Latency Simulations Can Impact Tapeout Milestones

For high-volume, low-latency simulations, the execution time varies from seconds to an hour. It has a relatively low memory footprint—hundreds to millions of jobs—and the regression run time tends to peak during the late middle of the project. For low-volume, high-latency simulations, typically gate-level simulation (GLS), standard delay format (SDF), and DFT simulations, the execution takes hours to days or weeks. It has a relatively high memory footprint and tens of hundreds of jobs. The regression run time tends to peak late in the project, around tapeout. Xcelium solutions address both types of simulation challenges.

GLS Methodology to Address Long Latency in Traditional Simulation Engines

Multi-Threaded Waveform Dump

A traditional simulator running on a machine with multiple CPUs can improve the performance of waveform dumping. In multi-process mode, the simulator initiates a separate executable that performs some waveform dumping processing. The process runs in parallel on a different CPU until the main process exits, resulting in 5% to 2X performance improvement, depending upon the number of objects probed.

Save/Restore and Checkpointing

Save/Restore from the initialization state reduces runtime. The Xcelium simulator’s save and restore functionality vastly improves turnaround time by reducing the time wasted on rerunning test segments common to multiple long simulations. Saving and restoring from an initialization state in a runtime environment helps reduce the overall runtime of a system. It provides the option to initialize only once and rerun it from that point to avoid hours of initialization. Save and restore helps checkpoint long latency runs to reduce debug cycle time as well as complete regression with higher throughput.

Unique Simulation Solutions

Xcelium multi-core distributes the simulation events among multiple cores, enabling faster simulation. Many simulation events are processed for high-activity tests at a single instance. High-activity simulation events are spread over numerous server cores to accelerate the design. In low-activity tests, all but one core from the cache is blocked, resulting in high cache hits ensuing in faster simulation.

Parallelism and Activity

Unlike the general assumption, parallelism inside a design is not structural replication. Xcelium multi-core parallelism is about language parallelism, which is present in all designs. Xcelium profiles test for activity. Multi-core needs parallel blocks to be active. Activity overcomes the overhead of managing communication between the cores. With more activity, the overhead cost reduces.

Xcelium Multi-Core Achieved 3X Speed-Up on ATPG Tests

The Lightelligence team experienced 5-day automatic test pattern generation (ATPG) simulations near tapeout. The ATPG tests were generated from the Cadence Modus DFT Software Solution. Xcelium multi-core with nearly the same script helped achieve 3X speed-up on average tests in ATPG regression.

Long-latency gate-level functional and DFT simulations occur late in the design cycle, and changes that cause the regressions to rerun delay tapeout. An efficient gate-level simulation methodology, including save/restart and Xcelium multi-core, reduces regression time. Lightelligence applied Xcelium multi-core with minimum changes to its DFT simulation set-up to achieve up to 3X speed-up.

This blog is an excerpt from the CadenceLive Silicon Valley 2022 presentation by Guoqi Lu, Design Verification Engineer at Lightelligence. Contact us for more information on accelerating simulations using Cadence Xcelium multi-core technology.