Get email delivery of the Cadence blog featured here
One constant in the semiconductor and EDA industries is, of course, Moore's Law. Another is the continual need for improved accuracy and performance. Accuracy is needed to take account of the effects that we used to be able to ignore. Performance is needed to handle the constantly increasing design sizes.
In this post, I am going to talk about what Cadence calls "Digital Full Flow". This is not a product name, but rather an umbrella term that covers about a dozen different products, as shown in the diagram below. These products, working together, deliver world-class PPA results. The core engines for all tools are threaded and distributed. There is concurrent power, performance, and area (PPA) optimization, with shared engines for placement, routing, extraction, and delay calculation, along with integrated STA and EMIR signoff.
This flow is in production deployment at 17 of the top 20, and has been used in over 150 7nm tapeouts. I don't know what happened to the notion that design starts at advanced nodes were going to sharply decline, but all of us who said that (including me) were completely wrong. Actually, not completely, since one of the other predictions was that older nodes would remain popular for a long time, and that has certainly happened. I believe 28nm is still the biggest node for design starts, since it is the last node without the complexity of double patterning, FinFETs, high resistance interconnect, and other artifacts of more advanced processes.
Well, just because something is good doesn't mean that we can't make it better. We have tied the engines at the core of the flow even more tightly together in three key ways: iSpatial, GigaOpt Machine Learning, and Compus.
First, Genus and Innovus have been companied with GigaOpt throughout the flow, using what we call iSpatial technology. I wrote about this before when Chuck Alpert gave a preview of some of this at last year's CDNLive in my post Genus and Innovus: Together at Last:
In the very old days, twenty years ago, synthesis tools just did synthesis, and place & route tools took the resulting netlist and placed and routed it. Synthesis didn't consider placement, timing was estimated using wire-load models. In my Ambit days, I used to liken this to estimating the time to travel between a set of cities in the US without knowing which cities they were. As Moore's Law advanced, it became hard to close timing with wire load models and place and route sticking exactly to the netlist it was given. So synthesis took placement into account (it knew which the cities were). And place and route would rebuffer signals to close timing. This involved switching buffers in the netlist (and perhaps other gates) for larger ones with higher drive strength, and also adding additional buffers to long signal lines.
iSpatial takes this even further, bringing more awareness of the physical design into synthesis to improve the netlist due to better estimation, and adding more of the techniques used in synthesis (Genus) into place and route (Innovus). If you know what it is, that means moving some of the restructuring into physical design, when a lot more is known. If you break the flow down into some of the individual steps like mapping and clock-tree optimization, then you get a single synthesis and implementation flow, that looks like this:
This iSpatial technology halves the turnround time and delivers a 5% improvement in PPA.
Machine learning in EDA can be used in two ways, that we call ML inside and ML outside. ML outside is capturing design and designer knowledge to optimize the flow. ML inside is selecting the beset computational algorithms to deliver better PPA faster. In the case of the digital full flow, this allows for a major reduction in pessimism. The PPA gets better with each run through the tools. The numbers are quite big, with worst-negative slack (WNS) declining by 25% or more, and total negative slack (TNS) by 50% or more.
This all comes about due to:
Compus is an optimization technology that is mostly applicable to datapath designs. Modern SystemVerilog is written at a fairly high level, which gives the synthesis tool a large architectural space to explore. Depending on the micro-architecture selected, implementation might vary from, say, less than 30,000 instances to nearly 100,000 instances. That's a big difference.
Compus is advanced structuring optimization, doing transformations on the CDFG (control-dataflow graph) as well as RTL transforms, and transforms specifically for datapaths. Depending on the timing constraints, this can produce very different results in a way that cannot be done by simply doing netlist level transformation.
The table below shows some results on various designs:
Result: 10-20% better PPA with improved turnaround time.
See the Digital Design and Signoff product page.
There is (or soon will be) a white paper accessible from that page Designing for Next-Generation PPA Requirements.
Sign up for Sunday Brunch, the weekly Breakfast Bytes email.