Use the Integrated Flow with US

9 Dec 2015 • 4 minute read

A couple of years ago, it was clear that the Cadence implementation flow required a from-the-ground-up re-creation. Nobody likes to say their old tools were not as good as they could be, but that is obviously the case when the new ones are better. And in this case they are much better.

The reality is that for the chips of that era they were fine. But chips continued to get larger, the number of corners required for analysis exploded, power had become a dominant requirement. Another big change was that design teams had access to large farms of servers and wanted to be able to leverage them to speed up working on these next-generation chips. A lot of things had changed and so a completely new approach was required: massively parallel, common engines, and re-written core engines for the modern era.

Probably the biggest issue was that historically different tools used "estimates" of how the following tools in the flow would behave. After all, a good placer during synthesis is one that correlates with the actual placement. Good timing in physical design was one that did a good estimate of what timing closure needed. This approach worked acceptably on smaller designs on non-leading edge nodes, but not for the most demanding designs. The reality is that a good placement during synthesis is the one that uses the identical engine, and good timing during physical design can only be achieved with the actual timing closure engine. So the engineering team decided to bite the bullet and create a family of unified engines that would be used throughout the flow:

Unified placement
Best-in-class power, performance, and area (PPA) optimization
Unified timing, power, and extract
Unified clock tree synthesis (CTS) and global router
Common data model
And a zillion more features, notably it is being used for 10nm, and fully supports multiple patterning and FinFETs

This solution would ensure the closest possible correlation as the design progressed between different stages. So the big changes were to build unified engines, make everything massively parallel, and create new core engines for best-in-class PPA at the end, the only QoR that really counts.

The first of the tools to be released was the Tempus Timing Signoff Solution, for static timing signoff, then the Voltus IC Power Integrity Solution for EM/IR analysis, and the Quantus QRC Extraction Solution. Next was the main Innovus Implementation System with new versions of the GigaPlace Engine and GigaOpt Optimizer. Just before DAC, we announced the Genus Synthesis Solution and finally Joules RTL Power Solution (somehow an "le" slipped into the middle of the "us") for RTL-level power analysis.

The result is:

10-20% better PPA
Up to 10X turnround and capacity gain
Full flow correlation leading to design convergence
Reduced iterations and so earlier signoff

The main flow really divides into two parts: Genus and Innovus, the implementation flow. The above benchmark is a 5M-instance 1GHz GPU. It shows not just how good the tools are individually, but that to get the full benefits of Innovus only happens when Genus is used for synthesis. And the suite of signoff tools that work with them. But all the engines are common across both groups. There is also Joules for RTL power analysis which doesn't quite fit into this taxonomy.

The above diagram shows what is under the hood in each of the signoff tools. But don't forget that the entire flow is linked by the common engines and data. Of course it goes without saying that everything is color aware (for multiple patterning), supports FinFETs, and is already being used for designs in 10nm.

It isn't just working from a technology point of view. It is working for getting better designs into real products. For example, of the four benchmarks in the above chart, two immediately converted to Cadence to get the improvements in reality, not just in an evaluation.

Many, if not most, designs these days have mixed-signal components. The "US" digital flow is also linked to Cadence's Virtuoso custom layout environment through the openAccess database. Funnily, this is a project that I launched back in about 2001, with a project we called SuperChip. I thought it would take three years, but I think in the end it took more like a decade. At the time, we combined digital and custom/analog engineering into one organization and I took over marketing for both product lines.

I have run engineering organizations and I know how difficult it is to pull together projects like this rewrite where there are so many moving parts. So buy the team a beer. That would have to be that famous stout, Guinus.

The annual Cadence Implementation Flow Summit is tomorrow, December 10. I'll be your host for the day. I hope to see you there. Registration is here although it will close later in the day so we can print badges. You can still register on site though.