Pegasus Flies to the Clouds

11 Apr 2017 • 5 minute read

There is a famous line in Lewis Carrol's Through the Looking Glass that could have been written by the team leader of a design rule check (DRC) product:

"A slow sort of country!" said the Queen. "Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!"

With every process node, the number of design rules to be checked roughly doubles. To make it worse, the complexity of those rules increases, too, in two ways. One is that some rules depend on a real physical dimension (those associated with some lithography effects, for example) meaning that as the dimensions shrink, a greater number of polygons get in range. Secondly, the rules just get inherently more complex. In the early days of ICs, almost all rules were simple minimum width, minimum spacing, or enclosure rules. There were no rules that depended on the current direction in interconnect, no metal reflection rules, no rules due to OPC whereby some dimensions are simply not allowed, no coloring rules for multiple patterning. Since computers were not getting a lot faster any more—largely since Dennard scaling is over, and we can blame that on semiconductors too—the effect was that DRCs took longer and longer to run.

Just like there are only three numbers in computer science (0, 1, ∞), there are only four runtimes in EDA:

It's fast enough to run while I go and get a cup of coffee or reply to an email...I wouldn't complain if it were faster but not an issue
It runs over lunch, so I get two iterations a day (lunch, night)
It runs overnight, so I get one iteration a day
It takes days, we do anything we can to avoid running the whole job, but at signoff it is unavoidable

It goes without saying that 1-2 and even 3 are much more desirable than 4. But the reality is that signoff DRCs cannot be run overnight any more (three days is not uncommon) and even sub-decks take more than 24 hours to run. To make it worse, DRCs require machines with large amounts of memory and large numbers of cores. These are expensive to buy and so are rare. The typical job schedulers used, such as LSF or RTDA, aren't able to handle them gracefully. If you need four machines with huge memory capacity at the same time, then the job scheduler has to hold from one to three idle until the fourth is finally available (that's not quite true, but near enough). That wastes some of the most expensive compute resource while waiting, and since there are so few machines like that, the delay to even start a job can be significant.

What's needed is a DRC that makes a different tradeoff, able to run on a very large number of ordinary servers, either in an internal server farm (some companies have literally >100,000 core server farms) or out in the public cloud where the resources available are practically unlimited.

Pegasus to the Rescue

Pegasus is a winged horse in Greek mythology, the son (foal?) of Poseidon and Medusa. He also ends in the letters "us" like all of the new family of Cadence digital implementation and signoff products. Given the introduction to this post, you can probably guess that Pegasus is a new DRC. The current Cadence physical verification product has the marketing-free name of PVS, for Physical Verification System. Today, at CDNLive Silicon Valley, Anirudh Devgan announced Pegasus, officially the Pegasus™ Verification System. I'm just going to call it Pegasus (since the whole name is unwieldy, and the initials are ambiguous).

Pegasus works on both the analog/custom environment, seamlessly integrated into the Virtuoso platform, and also the Innovus implementation system. A key point is that it uses the existing foundry-certified PVS decks.

Pegasus can take advantage of even more parallelism than its other "-us" brethren. It is the first solution to combine a pipelined infrastructure with stream and dataflow architecture, resulting in near-linear scalability onto hundreds of CPUs. It is cloud-ready, able to run on internal server farms or on commercial external clouds such as AWS. The license server doesn't need to run in the cloud, meaning that it is straightforward to incrementally add huge resources from public clouds during peak usage periods such as final DRCs coming up to tapeout.

One of the early customers has been Texas Instruments and they have successfully used Pegasus a large number of CPUs to drastically reduce full-chip DRC runtime compared to their existing solution. Another early customer was Microsemi who found that jobs that previously ran for over 24 hours can complete in just a few hours. The result is shown in the diagram below, where without Pegasus expensive delays can be expected after timing closure, whereas with Pegasus the DRC and final ECO cycle is fast and predictable.

Using 360 cores, speedups on real designs at three different customers varied from 6X to 12X. Perhaps more important is the chart below, which shows scalability continuing to increase from 160 to 320 to 640 cores. Many parallelized tools show impressive speedup over a few tens of cores and then additional cores either show no further improvement, or in some cases cause a slowdown as the job of coordinating the CPU resources becomes the bottleneck.

So Pegasus allows companies to take advantage of large numbers of generic servers in internal server farms, scaling effectively to hundreds of cores. When internal resources are exhausted or unavailable, Pegasus can burst forth into the cloud.

Summary: Sunny with Cloudy Periods

Best-in-class scalability and performance
Cloud ready for internal and external clouds
Full-flow physical verification integrated with both Virtuoso and Innovus platforms
Uses the same foundry-certified rule decks