Get email delivery of the Cadence blog featured here
At CDNLive in Bengaluru (fka Bangalore), Cadence announced the Legato solution for smooth memory design. The press release went out that morning, and Vinod Kariat talked about it in the technical keynote. Later, during the technology update, Joy Han did a deeper dive into what was under the hood. If you have been around a long time, you will remember that Cadence products used to have names associated with music. A few remain: Virtuoso, Allegro, and of course the Cadence name itself. The current naming scheme has digital design and signoff products ending in "us", and verification products ending in "um." But the custom and PCB areas still sometimes keep the music tradition going, as they are doing today.
In music, legato means "smooth" in the sense of playing the notes without any gaps between. There is something of that in the Legato Memory Solution. Just as the three notes on the left are tied, to indicate they should be played legato, Legato ties together three key parts of any memory solution.
Memories and the tools used to create them are really important. As the diagram above shows, there are three main reasons for this.
First, memory dominates die area, and it is becoming more dominant, not less, as you can see by the expanding green bar in the graph above.
Second, the number of corners required for characterization in a modern process has exploded over the last decade, made worse by the need to produce advanced on-chip variation (AOCV) values for the library files.
And third, while there may be some economic questions about Moore's Law, the basic principle of 59% CAGR on the number of transistors (or memory bits) continues for the leading edge designs. But design productivity is growing around half that —so the third driver is that there is a gap that requires automation to close.
The Legato Memory Solution ties together memory cell design, memory array/compiler verification, and memory characterization. A key aspect, as with all recent Cadence products in many different areas, is that there are common engines under the hood so that no accuracy is lost due to mismatches between different engines: for Spectre, the same device models and the same netlist parser.
Cell design is powered by Monte Carlo analysis on Spectre APS. In modern processes, the variation analysis is complex. Distributions are not Gaussian, and fine-grained analysis is necessary because weird anomalies can happen, such as the typical value not lying between the worst case and the best case for some parameter.
Array/compiler verification can be very time-consuming since the netlists are inherently large (unlike cell design which involves just a single cell or perhaps a handful). Spectre XPS handles this by automatically identifying the critical nets for a given simulation, and then aggressively performing RC reduction of all the non-critical nets. As a result, the simulation time-step can be increased in the non-critical areas (while still being tightly controlled on the critical), leading to a big increase in performance. In general, when accessing a memory, only a single bit line and a single word line is in use, and perhaps a few more are critical. Most of the array is thus non-critical and can be reduced without impacting accuracy.
The third step is generating timing and power models for memories to be used in the digital implementation and signoff flow.
Another new development is Super Sweep Technology. When characterizing a cell at a lot of different corners (different temperatures, different slews, different loads etc) there is actually a lot that is common, or close enough, between the different runs that intermediate data can be reused. The above diagram should make it clearer. On the left is the way it used to be done, with every simulation starting afresh and not-reusing any data from other simulations. On the right is how it works in Legato, where intermediate results from some simulations can be reused in other simulations. The blue boxes that don't appear on the right all represent time saved.
Later in the day at CDNLive, Invecas presented their experience with generating AOCV (advanced on-chip variability) tables for memories using part of the Legato solution.
They looked at 3 approaches to characterizing memories.
The first was to use the full memory instance and brute force simulation. This is computationally very expensive. For a 1 megabit memory (16K words of 40 bits) there are 7.4M devices, 22.4M nodes, 22M RC elements.To do a full analysis of the CLK->Q arc, with 300 Monte Carlo runs, requires 350 hours and about 50GB (350 hours is a couple of weeks).
The second approach is to create critical path schematics for each component to be simulated and run Monte Carlo. One immediate disadvantage of this approach is that it takes engineering effort to create the schematics, and since they are done by hand there is always the risk of errors. However, this still takes too much time and memory, although not as bad as the first case. There are only 17K devices, 317K nodes,231K RC elements. The run-time is 84 hours (3.5 days) and memory requirement of 10GB for 1000 Monte Carlo runs.
The third approach is to use Liberty MX. The flow is in the above diagram.
The result is that the number of devices is just 560, with 12K nodes and 12K RC elements. Run-time is 1.45 hours (for 1000 Monte Carlo runs) and 1GB of memory. This is dramatically faster than the other approaches and the memory requirement is modest enough to run on any generic server.
Sign up for Sunday Brunch, the weekly Breakfast Bytes email