Post-Silicon Compute

10 Jan 2018 • 6 minute read

At the SEMI Strategic Materials Conference (SMC) a few weeks ago, Lucian Shifren of Arm Research talked about Post Silicon Compute. I always like presentations by Arm Research on semiconductor stuff. Since they don't have their own process, or their own fab, but they are deeply involved, they have a more independent view than people who work for semiconductor companies, who have more of an agenda. Plus, they see everyone's secrets.

A memorable presentation at IEDM a couple of years ago was by Greg Yeric of Arm Research, which I wrote about in my post Moore's Law at 50: Are We Planning for Retirement? One thing that he talked about, that I hadn't really thought about before, was how device engineers—like the sort of people that attend IEDM—spend all their time worrying about the center of the Gaussian of performance, whereas design engineers don't care since they have to worry about the best and worst case. Of course, it's actually more complicated than that in our multi-corner world. The result of this is that the variability (standard deviation) of the device performance is very important, but device engineers spend no time on it.

This was actually a point that Lucian made in his presentation, too, with the diagram above.

Moore's Law

Lucian, like Greg a couple of years ago, started from the idea that Moore's Law is ending. In fact, depending on what aspect of Moore's Law you look at, it ended a long time ago (power scaling), or recently (cost per transistor scaling) or still has several nodes to go (technical barriers). He showed the above chart from the Economist with all the predictions of when there would be no Moore. I think it is quite likely that the technological barriers will continue to be solvable, but not in a way that is economic. In this context, that means it will cost more to build a fab to make the process than the fab will generate in profit over its lifetime.

I showed this chart of Lucian's in my post SEMI Strategic Materials Conference. I won't explain every abbreviation, but the future includes triple patterning, self-aligned quadruple patterning, double-patterned EUV, direct-write E-beam, directed self-assembly, and more.

This is another graph that he showed. This is the heat flux trend over time. Bipolar ended its long run as the IC technology of choice when the heat got so bad that even with water cooling it wasn't enough. The CMOS line shows when microprocessors ran into the power wall and had to switch to multicore. Even with that, the highest powered systems need water cooling. The Palladium Z1 emulator is water cooled, but it is not just big datacenter racks like that, my son's gaming machine is water cooled, too.

Have you heard of DTCO, design-technology co-optimization? If not, you can read my post CDNLive: Design Technology Co-Optimization for N7 and N5. Basically it is adding little tweaks to the process that turn out to allow, for example, a track to be cut out of the standard-cell height. In years gone by, the people who designed processes didn't have a clue how they were used, they just had to get one transistor working and they were done. However, the impact of the process, for good and bad, has a big effect on library architecture and so, in turn, the overall chip area. For example, Lucian showed the above standard cell designs. This is a flipflop in 65nm drawn a second time in 65nm but using 32nm cell architecture (not even 16nm, let alone 7nm, which is far more restrictive still). So dummy poly on the ends, unidirectional poly, and more. The result is that the process complexity has made the cells 21.7% larger (before, obviously, accounting for the dimensional shrink).

What Comes Next?

I would say that the received wisdom from the semiconductor companies is that the next step in transistors is going to be gate-all-around transistors where multiple nano-wires run through the gate, which completely surrounds them. But Lucian is more circumspect. I have seen other analysis that shows that you will need at least three nano-wires to get performance better than FinFET in the previous generation (GAA in 5nm vs FinFET in 7nm), but he thinks that is not good enough. So he has a long list of technologies that he doesn't believe in:

Tunnel FETs or TFETs
Negative-capacitance FETs or NCFETs
Gate-all-around FETs or GAA
Vertical FETs, usually also proposed as gate-all-around
...Anything device ending in FET

Basically, he doesn't see any "crazy" FET that is going to come to the rescue.

Going Up a Level

To get increased performance, at least for processor-based systems, we have to address the fact that data movements dominate compute speeds (and energy). To address that we need to either move the data closer to compute, or move compute closer to data. However, there are major challenges with memory since all the technologies that are available have major problems:

DRAM: Can consume half of the system power and eDRAM is very expensive, plus scaling and double-patterning challenges
SRAM: 6 transistor has stopped scaling so going to 8 transistor
Flash eNVM: Not low-power CMOS process/voltage compatible, endurance limits are a huge problem (limited number of writes) and high energy, especially writing

As I have said in several previous posts, we will need to move away from a big homogenous multicore processor to heterogeneous multi-processors with a main processor and additional specialized processors. The measure of goodness is throughput per Joule, basically how much computation we can get done under a given power envelope. However, even with these approaches, we are fast reaching the efficiency limits.

On a longer term still, we will end up moving away from von Neumann architectures to something more "brain like." However, the ultimate goal of neuromorphic computing will require a non-volatile switch, which is not MOS by definition. Compute and memory will be intermixed like in our brains.

Super-Duper-Memory-Switch

What we need is a super-duper-memory-switch that has all the attributes over in the right hand column. But we are stuck today with SRAM, DRAM, and NAND-flash, which are a long way off. The closest thing we have are the new "storage-class memories" such as the Intel/Micron 3D Xpoint (Intel now calls it Optane), which sit in the middle of the memory hierarchy between things we think of as "memory" and things we think of as "storage". It is non-volatile, but fast, something we've never had before.

Conclusion

Moore's Law is, indeed, ending. But that is only one part of the problem:

MOSFETs have run out of steam, with newest nodes requiring more money for very limited improvements
FETs of all the weird types are not going to save us
Von Neumann compute is reaching its limit

For now:

Move compute to memory, or memory to compute
Wring the last few drops out of Moore's Law

Longer term:

The future of compute will look very different and will rely on non-FET switches
The future of the semiconductor industry will be oxides/insulators and not semiconductors

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.