Genus and Innovus: Together at Last

16 Apr 2019 • 5 minute read

Yesterday I wrote about a presentation at CDNLive Silicon Valley Qualcomm: Bring Me My Snapdragons, where Pavan Patibanda explained the methodology that they use for having a workable methodology for designing Snapdragon SoCs. The following presentation was by Chuck Alpert, the Genus R&D lead at Cadence, who talked about what the latest version of Genus is doing to address similar issues. The title of his presentation was Genus Synthesis Solution 19.1: New Architectural Compiler Technology Delivers the Best PPA for Advanced IP.

The short answer: doing similar things to what Pavan was doing, but linking up Genus and Innovus more tightly under the hood. In particular (spoiler alert) moving some of the re-synthesis capabilities of Genus into Innovus.

Just a quick summary of the vision for Genus:

Best runtime and capacity through massive parallelism
Best predictability through common engines
Best PPA through early physical and analytical datapath optimization

WNS and TNS

Before I go any further, let me explain the two important acronyms in synthesis, WNS and TNS. The S in both stands for "slack" so let me explain that with a little analogy.

You are having a dinner party, and you want all the guests to arrive at 7pm precisely. You are arranging Uber rideshares for everyone, so you need to tell the Uber drivers what time to pick up each guest. Some live nearby, some live miles away. There is risk of congestion in places. If a guest arrives five minutes early, that is five minutes of positive slack. If a guest arrives five minutes late, that is five minutes of negative slack. Obviously, synthesizing a multi-million gate block is more complicated than this, but synthesis is a bit like calculating all the times for your guests to leave, given where they live (placement) and likely traffic (routing congestion). Okay, I'm not going to push that analogy too far, but now you know what slack is if you didn't before.

An ideal synthesis result is a little like Oliver Wendell Holmes' one-horse-shay that ran for a hundred years and a day, and then every part broke at the same time. In synthesis, you want every path to have zero slack (exactly meet its timing constraint), so every signal arrives at exactly the right moment. Any signal that arrives early (positive slack) is like a part on the one-horse-shay that didn't break, it was over-engineered. In synthesis, this means it is wasting area and power to get the signal there unnecessarily fast. Any signal that arrives late (negative slack) is like a part on the shay that broke before a hundred years was up, it needs to be beefed up. Of course, just as it is impossible in real life to build a cart where all the parts break after exactly 100 years, it is also impossible to synthesize a design so every path has exactly zero slack.

WNS stands for worst negative slack and is the critical path through the design that misses timing by the most. TNS (total negative slack) is a sum over all the paths through the design of how much paths miss timing. And no, you don’t get credit for positive slack, since a path that misses timing cannot be compensated by another path where the signal arrives early. If a guest is late to your dinner party, it isn't compensated by someone who arrived while you were still in the shower.

These are the two best numbers for giving you an idea of how "good" the results are, along with area which needs no explanation. The designers need to look into more details than just these two numbers, of course, but when people talk about QoR (quality of results) they often just mean these three measures: WNS, TNS, and area.

Genus

Genus has a timing-driven partition algorithm, making sure that the WNS path doesn’t get split across partitions, since it is unlikely to be dealt with well where the critical path is split between partitions on different machines. The algorithm also tries to split the partitions so that they all have similar TNS (in that partition) so each partition has a roughly equivalent amount of work to be done. The partitions are then run on separate CPUs. They are independent, with no locking and waiting, even if two partitions happen to be on the same machine.

As to common engines, it’s what it says on the can. Genus, Innovus, Tempus all have common timing engines. Genus and Innovus have common placement engines, and common global routers. And so on for the other engines.

It is good to have a rough (over-) simplified view of how synthesis is actually done. The RTL is read in, turned into a control-dataflow graph (don't worry if you don't know what that is), which is then turned into generic gates. These are functions like real gates, but at this stage we're not even worrying about the actual cells in the library, that comes later. Generic gates, in fact, are often nowhere close to anything that actually exists in the library, such as a 64-bit AND gate. This generic netlist then goes through structuring, where it goes through a first level of optimization, and mapping (sometimes called technology mapping), to get to actual library standard cells. Then the cells are placed, and physically-aware resynthesis does its best to make the netlist make timing, along with resizing and rebuffering (so a NAND gate might be switched for one with higher drive (and more area) or extra buffers might be inserted. This last step, resizing and rebuffering, has also been a change that Innovus could make during place and route.

Traditionally, this order—synthesize, place, re-synthesis—has a couple of big disadvantages. Firstly, by doing the initial synthesis without being aware of placement risks getting into local minima. Then, even if the re-synthesis phase finds a good solution, it is a very inefficient way to get there. It takes a lot of work (runtime) in re-synthesis to correct an incorrect choice made at the initial synthesis.

Genus can do early physical design, taking placement into account from the beginning in “one shot” with limited re-synthesis: physically aware generic netlist generation, physically aware mapping, and physically aware optimization. Then Innovus, with the same engines, takes over, legalizes the placement, and so on.

But that was then. Designs have got larger (duh!) but in particular RTL designers need to improve their productivity and write fewer lines of code that do more. They can’t afford to spend a lot of time on hand optimizing the RTL to be really explicit to the synthesis tool. The more powerful the synthesis tool is, the more designers can just write RTL that expresses what the design needs to do and leave it to the tool to work out how to implement it. There is no point in second-guessing a tool with great optimization. Best case, you are wasting time. Worst case, you over-constrain the tool to do something sub-optimal.

Tomorrow

In tomorrow's post, we'll dive a bit deeper into what actually goes on in the latest 19.1 version of Genus (and Innovus).

Sign up for Sunday Brunch, the weekly Breakfast Bytes email