Get email delivery of the Cadence blog featured here
Support for multithreaded/multi-core simulation has been available in Spectre for many years allowing users to run a single analysis, for example, TRAN or HB, on a single machine with multiple cores. However, simulating designs using advanced node processes could result in performance and capacity issues, especially when post-layout parasitics are included.
Spectre X introduced distributed simulation as an extension to the multithreaded simulation where cores from different machines are used. Distributed simulation provides access to more cores, thus increasing the performance and capacity and providing more value for large designs.
The "single analysis" distributed simulation above is different from distributed parametric sweeps or Monte Carlo simulations, where multiple analyses are run on different machines.
Why and When?
There are two reasons for running a Spectre X distributed simulation: performance and capacity. Using a distributed simulation, you can increase the performance by adding multiple cores from different machines and increase the capacity by adding the memory together from each machine.
Distributed simulation for transient analysis performs best when the circuit is very large and of the post-layout type. It is always faster to run a non-distributed simulation on a single host compared to a distributed multi-host simulation, if you are using the same number of cores. For example, running a 32 core (+mt=32) simulation on a single host is typically faster than a distributed 2 hosts x 16 core simulation under the same circumstances.
If your circuit is small and you are running transient analysis, the distributed simulation may slow down because of the necessary overhead for this type of simulation.
For harmonic balance, it is not the circuit size, but the number of harmonics that decides whether the distributed simulation is beneficial or not.
There are two use models for a distributed simulation: local and farm.
If your machine network is not on a farm, you can run a distributed simulation on a local network by specifying the machine resources directly at the Spectre command line:
spectre +xdp=rsh|ssh +hosts “host1:proc1 host2:proc2 ...” +mt=#T
Here, host1 and host2 are the hostnames of the machines on the network on which you want to run the simulation. You can add as many hosts as you like, however, more than eight hosts will not yield the best performance. proc1 and proc2 are the number of Spectre processes to be run on each host. You should specify either 1 or the number of NUMA nodes (typically the number of sockets) on each machine. Any other value will not yield the optimum performance. #T is the number of threads for each requested process.
It is more common to run a distributed simulation on a farm and the command-line option for this is simple:
It is simple because the farm job submit command bsub requests all the resources and the farm provides the machines, processes, and threads to Spectre X that reads this information because of the +xdp option.
The following is an example requesting a total of 128 cores on 4 machines on an LSF farm:
bsub -q <queue> -n 128 -R “span[ptile=32]” spectre +xdp
The above command will run a total of 4 Spectre processes, each with 32 threads. A single process will be run on each of the 4 machines resulting in 4*1*32=128 cores.
When you run a distributed harmonic balance simulation on an LSF farm, you can benefit from running one process per NUMA node (or socket). The benefit comes from shared memory utilization, and is enabled by adding +ppn at the Spectre command line.
bsub -q <queue> -n 128 -R “span[ptile=32]” spectre +xdp +ppn
The above command will run a total of 8 Spectre processes, each with 16 threads. In this case, the machines given by the farm each have 2 NUMA nodes (sockets), therefore, 2 processes will be run on each of the 4 machines. This results in 4*2*16=128 cores for the analysis.
The Spectre X distributed simulation retains the golden accuracy of Spectre X while increasing performance and capacity.
The Spectre X distributed simulation provides exceptional performance and capacity improvement compared to any other circuit simulator.
The chart below shows a large (500k M, 30M C, 3M R) post-layout transient simulation. Spectre X provides improved performance and scaling over Spectre APS, as can be seen by comparing 8T, 16T, and 32T. When you add 2 and 4 hosts with 32 cores, the simulation scales up to almost 5X over 8T. This case shows over 30X performance gain versus a single thread. In some cases, you can get up to 60X performance improvement using 128 cores.
The chart below shows a similar performance improvement when running a distributed harmonic balance analysis. You can now include a large number of additional harmonics, improve the performance, and retain the accuracy.
Another benefit from a Spectre X distributed simulation is the capacity and reduction in memory requirements for each host involved in the simulation. By using a distributed harmonic balance analysis, the memory reduction is basically linear up to 4 hosts providing you the ability to simulate using 4X the memory.
To improve Spectre X performance, some farm and machine settings have been found to be optimal:
For more information on Spectre X, refer to the following:
About Spectre Tech Tips
Spectre Tech Tips is a blog series aimed at exploring the capabilities and potential of Spectre®. In addition to providing insight into the useful features and enhancements in Spectre, this series broadcasts the voice of different bloggers and experts, who share their knowledge and experience on all things related to Spectre. Enter your email address in the Subscriptions box and click SUBSCRIBE NOW to receive notifications about our latest Spectre Tech Tips posts.