Get email delivery of the Cadence blog featured here
Test is the red headed step child of EDA. FinFETs, self-aligned quadruple patterning, or parallel simulation get all the attention. But a chip doesn't only have to be manufactured, it also has to be tested. At CDNLive EMEA I met Erik Jan Marinissen of imec. Before joining imec, Erik Jan was at Philips Semiconductors (yes, it really was plural, now it is NXP of course). He heads up imec's (yes, it really is spelled with a lower-case "i") test projects.
It turns out I have a little bit of history with Philips test from my VLSI Technology days. Back in the early 1990s, test was still mostly done by taking functional vectors and then fault grading them. This was done with a conventional "stuck-at" model. One signal in the design was assumed to be stuck-at 0 (so remained 0 even if it was meant to be 1) and then the vectors were run, and the fault was considered to be detected if one of the outputs to the circuit was different, and so it would fail on the tester. Of course, as I described it, it would be horrendously slow to do the grading. The trick was to use parallel fault simulation, where many signals could be analyzed in parallel...but it was still slow.
In order to get coverage, the functional tests were getting very long. In some cases, testing a chip was apparently costing more than manufacturing it. Getting adequate test coverage was also getting harder. Everyone knew the solution was scan, but there were various barriers to adoption, from the lack of scan test pattern generation, to the amount of memory on the testers. Philips had their own internal scan test program called AMSAL. VLSI Technology had a partnership on various technologies, and as part of that we had access to AMSAL. It was developed by Philips Hamburg. We had a joint development with Philips there, in digital video processing, which was the new, new thing, so I had to visit occasionally.
One other project in test that Philips developed in Hamburg was layout-aware testing. Everyone knew that the stuck-at model didn't really represent the errors that could occur on a chip, but it was algorithmically simple and did a surprisingly good job of getting high test coverage.
Another test approach that came into vogue around that time was Iddq testing. Very early in the test, after a few initialization vectors, there would be "the" Iddq vector. At that point, the chip should be quiescent and be doing nothing. As a result, the power it was drawing should just be from leakage through all the transistors, so not zero but not much. It turned out that many possible errors in manufacture would cause the chip to draw significantly more current. If so, it was no longer necessary to run the test program proper, and the chip was rejected with a very low test time. Like the stuck-at model, it seems like this would not work well, but in fact some huge percentage of rejects failed Iddq testing, without requiring all the vectors to be run to find a more specific problem. In fact, Iddq testing also picks up some faults that the normal test program might miss.
Layout-aware (also called cell-aware) test looked at faults that might actually occur in a design, generally opens (a metal line break) and shorts (two metal lines bridged together). This was in addition to the stuck-at model, which did a reasonable job of testing transistor failure modes. There is also another whole class of faults, known as delay faults, where the design works at a slow clock-rate but fails at the speed the chip would run during actual use. One solution to this was at-speed testing, but that required testers that could run that fast. Anyway, that Hamburg test team was eventually acquired by Mentor and they took over the program.
Eventually, the scan test approach became the mainstream, and the amount of test time dropped again. But test times were long, and getting longer, and once again the prospect of test costing more than manufacture loomed ahead. But test compression got that under control. For more details on this, see my post Modus Test Solution—Tests Great, Less Filling, although the current product name is now Modus DFT Software Solution.
Erik Jan had worked, with some students, on testing 3D ICs, where multiple die are stacked and connected with through-silicon-vias (TSVs). There are two big test issues with 3D packaging. The first is the "known-good-die" (KGD) problem. In a single-die package, if a die passes wafer sort (testing before the wafer is sawn up), but then fails after packaging on a more extensive test, then the cost of packaging is lost. But the die was always bad. In a 3D package, if a die passes wafer sort, and then gets packaged, and then fails, then it will take one or more good die with it. For example, a hybrid memory cube (HMC) consists of a logic die and 4 DRAM die. If one of those DRAM die were to pass wafer sort, then get packaged, and then get found to be bad, it also causes 3 more DRAM die and a logic die to be discarded.
The other problem, one that Erik Jan worked on with Cadence, is how you get the vectors up to the die above since the package pins that the tester can access only connect to the lowest die of the stack. The basic idea is to build "elevators" that take the scan test data in at the bottom die and then take it up to the die being tested for delivery. That project wrapped up. But Erik Jan had two of his students, one also in the Cadence Academic Network, presenting at CDNLive on their work.
Leonidas has been looking at the problem of modular test and realistic IR drop. Scan test tends to have a much higher activity level than normal operation, running a pseudo-random sequence of 1s and 0s through the scan chain very fast, stimulating all the combinational logic excessively. In a modular test, where each bock on the chip is tested in turn, there is an additional problem.
If nothing more is done, then the vectors being applied to the module under test (MUT) will also randomly jiggle inputs to other blocks. This will usually be a lot more activity than these other modules would experience in normal operation, perhaps ten times as much, and so can pull a lot of current and cause IR drop at the MUT, and spurious failures. The obvious thing to do is to isolate the MUT so the other blocks are quiescent. But that has the opposite problem. Now these blocks draw no current, so the power supply to the MUT is unrealistically good, and so there can be spurious passes, and so test escapes.
Leonidas's solution is to add a programmable toggle generator to each block. If this is tuned appropriately, then it will toggle the other modules at a realistic level while the MUT runs its own test. This will neither be optimistic (power supply too good) nor pessimistic (IR-related power supply droop) and so minimizes both escapes and false positives.
The toggle generator consists of counters and comparators to generate the sequence. It is set up in two stages:
The chart above shows the difference between pessimistic (red), optimistic (blue), and realistic (green) generated with the programmable toggle generator.
Zhan Gao is working on cell-aware test. As I said above, this looks at opens and shorts in addition to stuck-at signal on the inputs and outputs of the standard cells. Since an open can only occur if there is a run of metal (or a via), and a short can only occur when two metal lines are close, it is essential to look at the layout to see which problems might occur in practice. This is especially a problem with shorts, since a naïve approach will end up considering all pairs of signals, which is rapidly computationally infeasible.
One way to address this is to look at the layout polygons and analyze the situation precisely. As it happens, I wrote VLSI Technology's original circuit extractor so this is a domain where I have a lot of knowledge. It is non-trivial to "look at the layout", especially when you need to look for signals that are close (so they might short) but not waste a lot of time looking around too far away. Her approach is to use PVS+QRC to get the parasitics for each segment of interconnect.
For the potential opens, anywhere there was a resistor, there was potential for an open. The Rs lead to pairs of transistors where the possibility that they are not connected needs to be considered. By "considered", she meant adding, if necessary, vectors to the scan test sequence to ensure that the problem would be detected.
For the potential shorts, she looked for high C values, indicating metal segments that were close enough to be coupled, meaning that they were probably close enough to short too. Those pairs also needed to be "considered". The threshold she used in the technology she was testing with, Cadence's generic 45nm library, was 1x10-19 F. So only net pairs for which at least one segment pair is above this threshold are considered. This gets rid of the quadriatic explosion of considering all the net pairs. In the example she quoted, it got from 286 to 16 pairs, and in on complex flop from 2800 to 74.
Zhan's approach does a once-per-cell analysis of the layout to build what she calls the defect detection matrix (DDM). She then generates additional tests. There are various optimization, such as finding signals where one or more inputs can be considered don't-cares, which can be used to further reduce test time. There are also a surprising number, 36%, of undetectable faults that show as delay faults or redundant. For example, if two metal segments are joined by two vias, and one is open, then this is a delay fault (the timing will be different, but the signal will change). She plans to cover delay faults like this in the future.
Also, in the future, she intends to derive the candidates directly from the layout rather than just looking at the parasitics. She further wants to consider transistor failure modes. For example, FinFETs are normally doubled or tripled and generally only one of the transistors of the pair or triplet will be faulty.
Cell-aware test seems to give a significant increase in test quality (fewer escapes) at modest cost in both increased test time and increased time to generate the test programs. The above chart shows the cells in the libarary (ranked by number of inputs) and how few additional vectors are required to catch all the opens and shorts that don't result in just a delay fault.
Sign up for Sunday Brunch, the weekly Breakfast Bytes email