TI Flies Pegasus in the Clouds

4 Jun 2018 • 5 minute read

Kyle Peavy of Texas Instruments reported on their experience with Pegasus, Cadence's newish physical verification system, at CDNLive Silicon Valley. You won't be surprised to know that TI's chips are getting more complex and larger, and that on its own leads to longer DRC run times, all other things being equal.

But all other things are not equal. As you can see from the charts below, each process generation brings more rules (roughly double) and more complicated rules. The combination of more complex chips and more complex rules meant that DRC for a large 16nm chip was about 103 hours (or about 4 days). Even sub-decks take over a day to run.

drc rules complexity

Even throwing more cores at the problem was inadequate. TI's experience on this benchmark chip was:

voltage rules, 21 hours on 60 cores
multi-patterning technology rules, 22 hours on 64 cores
front end, 36 hours on...dum dum dum...260 cores
back end, 24 hours on those 260 cores

Another problem is that those cores don't come cheap. The preferred host requires large memory (256GB) and a large number of cores (16+). This is expensive, but also means that physical verification jobs can be blocked waiting for these big machines, while large numbers of small machines are available but not really up to the job.

"Pegasus Saves the Day"

If Kyle got his dream, then he would want:

reduce total CPU hours in total
scale to as many CPUs as he has available, with close to linear scaling even to very large numbers
use common servers, with low number of cores per host, and less memory
cloud-ready, for when he doesn't have enough CPUs in the TI datacenters
use the existing qualified PVS decks

A couple of less obvious things on his wish list. The job should start when the first cores are available. Even if the number allocated (say 100 cores) turn out not to be available, the whole job should not block waiting for the cores to all come free. For one thing, that's wasting a lot of CPU hours just idling while waiting, and secondly, that 100th CPU might take a long long time to free up. The other wish is that the system is fault tolerant. With hundreds of cores, then stuff will crash sometimes, and even though that will lose some work that will need restarting, the job as a whole should not fail.

Pegasus Requires Fewer Total CPU Hours

pegasus scaling

The above charts show the same deck, running the same design, on the same number of cores (on a smallish block). On 32 cores, Pegasus is 2.5X as fast, on 72 cores it is 2.8X times as fast. The scaling is not quite at the theoretical maximum, but all the way up to 72 cores it is pretty close (the red line on the graph on the right).

Pegasus Scales to Many CPUs Efficiently

pegasus scaling The above chart shows the same job with the run-time reduced by as much as 7X just by throwing more cores at the job. This is a medium-sized block, the type that there are maybe 40 of on an SoC, so a job that runs all the time. All the way up to 64 cores there is good scaling. The efficiency drops a bit once run-times per core are about 45 minutes since there is too much overhead setting up the job that is not won back by the increased parallelism before the job finishes.

Pegasus Uses Small Hosts

cores per cpu The above chart shows how efficiently Pegasus can make use of machines with few cores (or where there are only a few cores free since the others are doing something else). The chart is not completely clear. The top 3 bars are all for 16 cores, but with those 16 cores coming 2 at a time, 4 at a time, or 8 at a time. The efficiency remains around 90%. The middle three bars are the same for 32 cores, and the lowest 3 bars for 64 cores. The takeaway from the whole chart is that efficiency stays around 90% even if only CPUs with low numbers of cores are available.

Pegasus Starts When the First Core Is Available

Work starts immediately. TI is using LSF to queue jobs and too many runs in parallel can bottleneck LSF throughput. However, this is not a problem with Pegasus and the job starts the moment the first host is available. Then, as additional hosts come online, they get started right away. Kyle noted that "overall efficiency is not affected when some hosts come online late."

Pegasus Is Cloud Ready

Pegasus can "burst to the cloud" and run with a mixture of in-house datacenter and public cloud. TI doesn't actually use this since they have enough resources in-house, but Kyle acknowledged that if they get up to 1000 CPUs then they will need to do this. However, he pointed out, for small companies that don't have huge compute farms this is really powerful. Even in TI they seem to be able to farm out designs to unused resources, such as idle machines in their Indian datacenter.

Pegasus Is Not (Yet) Fault-Tolerant

OK, so Pegasus doesn't get a perfect score. But that is work-in-progress.

Just when you are "happily dreaming about another day of DRC fixing" then a machine crashes. For now, you have to restart the job. But a coming feature of Pegasus is fault-tolerance where Pegasus will recover from the loss of a host. "I'm looking forward to seeing this feature," Kyle said.

Pegasus Summary

Pegasus runs with existing qualified rule decks, scales well to large numbers of CPUs, and doesn't require big, expensive servers—it can make do with everyday ones. It can even "fit in the cracks" of availability across an active compute farm.

When Pegasus is in the critical path to design closure "the designer can turn up the heat by raising the number of CPUs as appropriate. If not enough CPUs are available, Pegasus will do the best it can with what it gets."

Sign up for Sunday Brunch, the weekly Breakfast Bytes email