Get email delivery of the Cadence blog featured here
Igor Keller gave an internal presentation on Handling Variability in the Modern Design Cycle last week. He is one of the key architects for the Tempus timing engine. However, since his presentation was mostly about the state of the art in static timing analysis, rather than anything confidential inside Tempus, he agreed to let me share it externally.
If you go back far enough, there was a world before static timing analysis (STA) when timing was verified using simulation. But as designs got larger, the design world realized that there was a lot to be gained by constraining the design style to use strictly synchronous logic. Then a more static approach could be used. This also corresponded to the era when synthesis became the dominant design methodology, which required similar timing under the hood to drive the optimization phase. Fundamentally, STA is about analyzing the circuit to ensure that the correct signal is latched. If a design is not perfect, then typically the errors are summed up in two numbers: the biggest miss, or worst negative slack (WNS); and the sum of all the signals that arrive too late to be latched, the total negative slack (TNS). These two numbers give a good feel for how close the design is to "meeting timing", which has both these numbers at zero (or positive, although large positive numbers can indicate a lot of over-design).
When STA was first introduced, cell timing models typically were slew-oriented. You can think of slew as the slope of the signal; in other words, how fast it is changing. The model would take input times and slews, and produce an output time and slew. Designs would be handled with graph-based analysis (GBA), in which each gate reduces all the input changes to a single output change. More accurate is path-based analysis (PBA), which preserves all the information from the different paths. But this blows up exponentially, so a combination is needed for a practical level of accuracy and run-time.
So the three key technologies in STA are:
In advanced nodes, there are new phenomena and new challenges that make the simpler approach of the past inadequate. When we switched from planar to FinFET transistors, immediately we ended up with denser designs, higher clock speeds, and tighter metal pitches. At the same time, the needs of the mobile market limited power drastically (which means getting the supply voltage as low as possible). That means:
The old way of doing things could, of course, be guarded by pessimism as usual, but that would mean that it wouldn't be possible to design anywhere close to what the silicon was actually capable of. As always, the alternative to pessimism is accuracy. But addressing all these concerns is not simple.
In extreme cases, crosstalk can cause functional problems, but in less extreme cases it affects timing. If the victim net and the aggressor net change in the same direction then the signal can change faster; in opposite directions then slower. The challenge is that designs are extremely large and extremely non-linear. When looking at the whole circuit, there are both timing and logical constraints between victim and aggressor (for example, a trivial case would be if the victim and aggressor happen to be the input and output of an inverter; then we know they will never be moving in the same direction). The solution is costly non-linear optimization.
Using the simple slew model I mentioned earlier is not enough since the detailed shape of the waveform has a strong effect on delay. The diagram below shows a falling signal with four different signal shapes all with the same slew, but there is as much as a 10% different in the output timing.
CMOS is largely unidirectional. The input of a gate affects the output and not the other way round. But, in fact, due to weak linkage through capacitance, this is not 100% true. The differences can be quite large.
The delay of a circuit is a strong function of the supply voltage. But the supply voltage (locally) is affected by switching activity in that region of the chip. At one level, the delay on a gate can be affected by switching activity on nearby nets, not only through crosstalk but through voltage droop. The fact that the power networks are so big makes things worse.
Further complicating things is that there is a disconnect between STA and rail analysis and fixing. STA assumes a certain amount of voltage droop, and rail analysis has to deliver it. But, in fact, almost all parts of the circuit will not be nearly that bad. This is another area where the current approach is too pessimistic and better accuracy will be required.
There are large numbers of sources of variation. The challenge is that as the amounts of anything get smaller, the often fixed errors become larger as a percentage and so have a bigger effect on timing. For example, if lithography line-edge roughness is fixed (which it largely is), then this will have a greater effect the narrower the metal.
There are also layout-dependent effects. Some of these are due stress effects, meaning that the timing of a gate can depend on what is around it. There are also OPC effects, where the correction applied depends on an increasingly large radius and this too can affect timing.
How to model on-chip variation (OCV) is a challenge. The old approach of using a single derating factor for all instances is obsolete. A realistic approach has to assume statistical variability between cells. The standard LVF (used to hold timing models) uses stage-based on-chip variation (SBOCV). Going forward, some form of statistical OCV is likely required with a good tradeoff between runtime and accuracy.
I said at the start that STA is about ensuring that the correct signal is latched. But with all the complexity of modern design with signal- and layout-dependent effects and extensive non-linear interactions, this is not really a well-defined concept. The standard deviation of the slack becomes large compared to the actual slack (or even the actual delay). So the first implication is that there are different approaches to fixing slack depending on whether timing is being missed due to the wrong timing, or just because the spread has got large that the likelihood of missing timing gets large.
The ideal STA would actually take all the data and translate it into yield. After all, ultimately that is what the end user cares about. Aiming for 100% is probably too extreme since the only way to get there is extensive over-design of everything. Too low a percentage yield and the chip becomes too uneconomic to manufacture (with 0% obviously being catastrophic).
One trend is the increasingly statistical nature of variation. We are moving from advanced OCV (AOCV) to statistical OCV and moving towards statistical STA (SSTA) that was pioneered by IBM some time ago but was considered too expensive for everyone else back then. The variation-aware timing engine needs to get pushed up into the physical design (giving variation-aware optimization) and then to synthesis (creating design for variation, where the design already reduces variation).
Clearly, all of these challenges bring a lot of complexity into STA. The computational cost is huge, which in turn drives the requirement for massive parallelization. The big tradeoff is runtime vs. accuracy and not just overall, but for each of the various steps that make up the overall STA. As with almost everything in EDA, it's not going to get better: designs are going to get larger, and the number of factors that can affect timing (and their non-linearity) is going to get worse.
It's almost amazing STA works at all in a modern node, but somehow the technology has, by and large, kept pace.