Are General-Purpose Microprocessors Over?

10 May 2017 • 8 minute read

There is apparently a rule of thumb among journalists that when the headline of an article asks a yes/no question, the answer is always 'no'. Well, of course microprocessors are not over, so the answer follows the rule. But improvements in general-purpose microprocessors are over. The future will lie elsewhere. But let's start at the beginning.

I had lunch recently with Chris Rowen. When he became an IEEE fellow, I interviewed him and you can read some more background at He's a Jolly Good (IEEE) Fellow. He was a sort-of founder at MIPS and was the founder of Tensilica, which Cadence acquired in 2013. He was CTO of the IP group, but recently left to do a number of things in the area of deep learning. He does, however, still work with Cadence on Tensilica in general and deep learning, such as the recently announced Vision C5 Neural Network DSP.

The Future of General-Purpose Microprocessors

One thing we discussed was the future of microprocessors. He had recently attended a seminar where both David Patterson and John Hennessy presented on the topic. John was Chris's PhD supervisor and the founder of MIPS (and for 16 years was Stanford's president). David Patterson was at UC Berkeley and the main creator of Berkeley RISC, a program that he and his successors continue today with RISC-V.

The two of them wrote the book on computer architecture. And I mean that literally, as well as figuratively. The book is Computer Architecture: A Quantitative Approach. The first edition of the book was published in 1990. You can see a copy below, along with the 1990 edition of the professors.

It is a graduate-level textbook; if you want something more introductory, you might prefer their undergraduate book. Here is a picture I took yesterday, of Dave Patterson with the latest version of that book, released just a couple of weeks ago, now called Computer Architecture and Design RISC-V Edition: The Hardware-Software Interface.

There have been two trends in microprocessor development over the years. One is Moore's Law, or especially the Dennard scaling version of it, where at each process generation you used to get more performance, the same area, and lower power...what An Steegen of imec calls "happy scaling." The other are developments in computer architecture, such as pipelining, instruction reordering, branch-prediction, and cache architecture. I was at a Microprocessor Forum, what is now the Linley Microprocessor Conference, but was originally organized by Michael Slater's Microprocessor Report, about 15 years ago, and one of the luminaries of computer architecture pointed out that architectural developments accounted for, I think, a factor of 8 improvement in performance. All the rest was Moore's Law (and I think you have to put cache memories in the column, too).

Electronic computers can be divided into a number of phases. The early computers during the second world war were built out of tubes (valves in UK). In the 1950s, they were built out of discrete transistors and other components. Once the integrated circuit was invented, it was too small to fit a microprocessor. They were built out of basic gates and gradually more complex components up until the Intel 4004, which is generally considered to be the first true microprocessor, released in 1971. After a decade or so, a powerful microprocessor could be built on a chip and various flavors of PCs came into existence, eventually dominated by x86. There were developments in computer architecture, and every couple of years a new process node would arrive. During the 80s and 90s, the joke was the "Andy gave a lot more performance, and Bill took it away." Andy being Andy Grove, the CEO of Intel, creating faster and faster processors. Bill being Bill Gates, then-CEO of Microsoft, creating more and more capable operating systems, but ones that required those faster processors to run.

Optimize Your Program by Waiting

However, during that era, the basic strategy for anyone whose application didn't run fast enough was to wait. A new computer would be along with more performance and that application would automatically run faster with essentially no effort on the part of the programmers. If it was a lot too slow, then wait for multiple ticks of Moore's Law. In the early 2000s, that came to an end. We left the era of happy scaling, where with each process generation we could increase the clock frequency without the power getting out of control. Pat Gelsinger, Intel's then-CTO, pointed out that if Intel kept increasing clock frequencies the way that they had historically, the power density of a microprocessor would be the same as a rocket nozzle.

So instead of using all those transistors to build a more and more complex microprocessor with a higher and higher clock rate, the growth moved into cores. The clock rate remained (roughly) the same, but each process generation could support chips with more cores. Now, if an application didn't run fast enough, waiting didn't help since single-core performance was barely increasing. The only way to make an application have higher performance was to make it use multiple cores. For some applications, this was relatively easy. it is not that hard to scale dozens, or even thousands, of cores to serve a lot of internet connections that don't interact much. However, other applications didn't scale so well.

All schemes to make a program parallel run into Amdahl's law. This is actually a formula, but the simplest version is that if cores are so cheap that you can have as many as you want, and if most of the program is so parallelizable that it runs in negligible time, then the only thing left is the part of the program that cannot be parallelized. So if 5% of a program is the non-parallelizable portion, then the maximum speedup possible is 20X (the bit that is left).

The conclusion of Hennessy and Patterson at the seminar was that it's all over. There are basically no more architectural tricks to improve the performance of a general-purpose microprocessor. Moore's Law will not improve single-core performance any more. Multi-core works for some applications but for many it does not. For everything else, which is actually many application areas, there is no way to get faster performance out of running the code on a general-purpose microprocessor.

So is that it? Nothing will ever run any faster?

Special-Purpose Processors

There is actually a let-out clause in the statement, and that is "general purpose." The history of computing up until now has used a basic Von Neumann architecture. Without going into any details, the hardware was built to run any program, and then lots of people created all sorts of programs to do things that the people who designed the hardware had never even considered. Basic instruction sets didn't change much, so programs just got faster as the hardware got faster. You didn't need to build a special computer for most applications. You use the same computer to run synthesis as place and route, or circuit simulation. Or, for that matter, Powerpoint, or an airline booking system.

There is one area in EDA where that is not true: emulation. That is a long way from a general-purpose processor, it is highly specialized. It does one thing, emulate designs, but it does that much much faster than the general-purpose processors, which is why people are prepared to purchase them. But it is very specialized. You can't run place and route on an emulator, let alone an airline booking system*.

On a smaller scale, that is what Tensilica does. Through specialized architecture, instruction set extensibility, and a lot of clever design around data movement, the Tensilica approach allows someone to build a specialized processor for a specific task, such as visual recognition, or wireless modems, or decompressing hi-fidelity music. Software is software, so at some level you can always do these things on the general-purpose processor. However, there are performance constraints (if you are making a phone call on an LTE network, you want it to work in real time) and power constraints (you don't want the phone to melt or the battery to last just five minutes). These are important. In datacenters or the cloud, there are racks and racks of general-purpose microprocessors. But many of these come with a specialized processor, too:

Google puts a TPU (Tensor Processing Unit) on many (all?) of their new servers
Microsoft Azure puts an Altera FPGA on all of their new servers
Some servers come with an NVIDIA GPU as a processor, not so much for graphics as for doing the training phase of neural networks

Even datacenters, with their almost infinite compute power, are going the specialist processor route. These are specialized but not so specialized that they can't be used for multiple applications, just not for most things. What about mobile? That has slower processors and much tougher power envelopes to live with. And IoT makes mobile look luxuriously provisioned with processing power and batteries. Everywhere, specialized processors are the way to get increased performance now that simply waiting no longer works.

The video below shows Chris himself discussing how configurable processors can be used to reduce power compared to a general-purpose processor (4' 38").

The Future of Computing

So the future of computing is going to be different from the past. No longer will we have just an off-the-rack processor that runs any program. For things that need exceptional performance we will need to use specialized processors (or, perhaps, in the limit build special SoCs). But, as Geoffrey Burr said at ERPS (see The IRDS Roadmap at IRPS), when you can't use Von Neumann for everything, you're not going to like it.

*Yes, I know, technically you can run anything on any universal computer, in the case of an emulator by emulating, say, a mainframe running an airline booking system, but you won't like the performance.