Get email delivery of the Cadence blog featured here
Dark Silicon may sound like a character from the latest Star Wars movie, but it actually refers to limitations on power on large SoCs, which makes it increasingly difficult to power up the whole chip at the same time.
For years, we took advantage of a phenomenon known as Dennard Scaling. In fact, this goes all the way back to a paper by Dennard in 1974. It is one of the legs of the stool of Moore's Law. It says that in each process generation, the power density is proportional to the silicon area (because we can lower the voltage as the transistors get smaller), so that as a given sized chip goes from node to node, the power remains the same even though the functionality and speed are roughly doubling. So we could double the logic on a chip, run it faster, and keep the power the same. What's not to like? This, in turn, led to other famous law such as Joy's Law that the peak computer speed doubles each year and, thus, is (was) given by a simple function, speed = 2 ^ (year-1984).
However, Dennard scaling has broken down for the last 10 years. For it to work, it assumes that most of the capacitance is intrinsic gate-capacitance.This gradually became less and less true, with other parasitics becoming more and more significant, and by the time we get to FinFET the intrinsic gate capacitance is just a small part of the overall switch capacitance.The result was that voltage did not reduce as much as required to make the Dennard Scaling work the way it had for the previous 30 years. The power limitations got so severe that it was no longer possible to increase the clock frequency. Famously, Intel's CTO at the time, Pat Gelsinger, pointed out that the power density would be such that chips would get as hot as the sun, which clearly wasn't going to happen. As a result, higher level laws like Joy's Law also broke down. Computer speed no longer doubles every couple of years.
The graph below summarizes the situation today. Transistors continue to increase in number. Single thread performance is flat to declining, as is the clock frequency. Power is flat by definition, since we can't get any more out of the package.
The solution was multi-core. Instead of increasing the clock frequency, additional computer power was delivered in the form of multiple cores. One issue with that, beyond two or four cores, is that partitioning algorithms to make use of all the cores is a hard software problem. In some areas it is fairly easy (Photoshop, for example, with millions of pixels that can each be processed largely independently a lot of the time, or web servers handling thousands of clients who barely interact). But there is no general solution to increasing the performance of a design in the way that we had in the old days: if the software was too slow, then wait a couple of years. If the software is too slow on four cores, then it doesn't automatically double in speed on eight in most cases. So you see in the above graph that the number of cores was 100 (aka 1) until about 2005, and then it has started to increase dramatically.
At the Cadence Mixed-Signal Summit, the keynote speaker, Professor Bora Nikolic of UC Berkeley, pointed out that multicore was a one-time improvement in performance. I'm not sure that is true. If you make a step transition from single core to an infinite number of cores then it is obviously true since you can't add any more cores. That is probably the situation Google and Facebook have. They have effectively an infinite number of cores and adding more isn't going to make search any faster (although it will increase the number of searches that can be handled simultaneously). But that is not the problem we face on chips. We have gone from two cores to four cores and the number continues to increase. In a play on Moore's Law, I call it Core's Law: the number of cores you can put on a chip doubles every couple of years (based simply on the number of transistors).
But then dark silicon comes into play. As we get more and more cores using more and more transistors, it turns out that we cannot naively turn them all on at once. The power dissipation is too high and the chip (or at least some of the cores) get too hot. To prevent thermal runaway, there are a number of approaches. If there are unused cores, then the active tasks can be moved from the hot core to one in standby. Alternatively, the frequency can be throttled back a lot, or the frequency can be throttled back less and the voltage lowered (DVFS). It turns out the first approach is very effective provided there are spare cores. But that already assumes there is some dark silicon, in the form of unused cores.
At IEDM in December in his keynote, Greg Yeric of ARM considered a 28nm design. If Dennard Scaling continued to be true, then each process generation would dissipate the same power even though the functionality was doubling. But Dennard Scaling has stopped, and without any advanced power management techniques, each process generation has more and more transistors that cannot be turned on within the original 28nm power budget, reaching 80% of the chip by the time we get to 5nm. See the diagram.
All the power reduction techniques we have at our disposal, such as turning off clocks to blocks or turning off power to blocks, create dark silicon. Of course at the system level this might not matter since we are presumably turning off blocks we do not need to use (such as turning off the transmit/receive logic on a phone when there is no transmission going on). The voltage and frequency scaling are less abrupt, but they create dim silicon that we might like to run faster but cannot. But at the limit, where we really want to run the whole chip at full speed all the time, then we end up taking the hit in throughput (turn some cores off or run them more slowly). Even these techniques mean that more and more of the chip area and the extra transistors we get at each process node have to be dedicated to handling all this complexity of power architecture in a way that we never did 20 years ago when we just ran the stuff we wanted at the clock frequency of the chip.
There are other power management techniques such as heterogeneous processing. Instead of using one of the cores of a multicore microprocessor to implement some functionality, a specialized offoad processor, such as those built using Tensilica technology, can be used.This is typically used on smartphones for audio decode. But the freed-up core pretty much has to stay dark or no power is saved. So even these system-level techniques create dark silicon, too.
Go back and look at the graph at he beginning. The number of cores is expected to flatten, too, due to dark silicon. With a big increase in system complexity and sophisticated power management, we may be able to ensure that only silicon that doesn't matter is dark, but there is no guarantee. Dark silicon is going to be a real challenge.
By the way, nothing to do with dark silicon, but there is another Joy's Law, this one about business: "No matter who you are, most of the smartest people work for someone else." The corollary is that being open and partnering is the key to long-term success, compared to only using a single company's employees.