• Skip to main content
  • Skip to search
  • Skip to footer
Cadence Home
  • This search text may be transcribed, used, stored, or accessed by our third-party service providers per our Cookie Policy and Privacy Policy.

  1. Blogs
  2. Breakfast Bytes
  3. Sophie Wilson: The 2020 Wheeler Lecture (The 6502 to Multicore…
Paul McLellan
Paul McLellan

Community Member

Blog Activity
Options
  • Subscribe by email
  • More
  • Cancel
wheeler
Cambridge
moore's law
amdahl's law
sophie wilson
ARM
ARM1

Sophie Wilson: The 2020 Wheeler Lecture (The 6502 to Multicore)

10 Jun 2020 • 7 minute read

 breakfast bytes logolecture theater 1Since I was an undergraduate studying computer science at what was then called the Cambridge Computer Laboratory, I am on their mailing list. Each year, I get invited to the Wheeler Lecture. But it is normally held onsite at the computer science department. However, like so much, this year it was held online. That has the obvious downsides we have all become familiar with, but it did also mean that over 500 people could "attend", far more than could fit in Lecture Theater One of the William Gates Building. As Ann Copestake, the current head of the department pointed out, "maybe lecture theaters like this are going to be obsolete". She had needed to go into the department and so had actually done the introduction from in the empty lecture theater.

The Wheeler lectures are named after David Wheeler. He was the main designer of EDSAC. This was the first stored-program digital computer, and Wheeler thus was the first person to ever write a program that was stored in a computer's working memory (mercury delay lines). He was also one of the authors of what might be the first book on computer science, usually just called "Wilkes, Wheeler, and Gill". The actual title of the 1951 book was The Preparation of Programs for an Electronic Digital Computer, with special reference to the EDSAC and the use of a library of subroutines. This book, and David Wheeler in particular, is credited with the invention of the subroutine, and the idea of having a library of commonly used segments of code that could thus be reused.

 David Wheeler died in 2004. His wife, Joyce, who was an astrophysicist, attended the lecture and told us all that "David was a pioneer in so many fields, he would be delighted for this lecture to be given online in his honor".

My Take

I have written about several of the themes in the presentation. Here are some of the more recent posts on microprocessor architecture:

  • Domain-Specific Computing 1: The Dark Ages of Computer Architecture
  • Domain-Specific Computing 2: The End of the Dark Ages
  • Domain-Specific Computing 3: Specialized Processors
  • Dark Silicon: Not a Character from Star Wars
  • Fifty Years of Computer Architecture: The First 20 Years
  • Fifty Years of Computer Architecture: The Last 30 Years

The Wheeler Lecture

This year, the Wheler Lecture was delivered by Sophie Wilson from her living room. In her introduction, Ann gave a very brief history of Sophie's career. She was recruited by Herman Hauser after graduating from the Cambridge Computer Laboratory into the company that became Acorn. Sophie and Steve Furber created the prototype of the BBC Micro under immense time pressure. They then later co-designed the Acorn RISC Machine (ARM). In 1999, she developed Firepath at Element14 and the company was acquired by Broadcom. Firepath still powers the local office end of most DSL broadband. She is a Broadcom Fellow and has many other honors. Her lecture was titled...

The Future of Microprocessors

Sophie started with the fact that it is 40 years since microprocessors were introduced. Since then they have got over 10,000 times faster and billions of times more common. "Everything in our lives is run by microprocessors." She said there would be two laws in her presentation, and a few graphs.

Moore's Law

Of course, the first law is Moore's Law, which she pointed out was not a law but an observation. It became more law-like when the International Roadmap for Semiconductors (ITRS) made it its mission to make sure that Moore's Law remained true. The ITRS has now been superseded by the International Roadmap for Devices and Systems (IRDS) which has a similar goal, at least for microprocessors.

arm1

This picture is Sophie with a plot of the ARM1 (although like many first of a series it was simply called the ARM in that era). I have a little bit of a claim to fame here. I installed the VLSI Technology EDA software on a couple of Apollo workstations, which were then used to design the first ARM chip by several engineers who would later go on to be the senior management of Arm, the company, when it was spun out of Acorn. For more about the history of Arm, see my post Happy 25th Birthday, ARM. The nearest equivalent chip to the ARM1 today is the Arm Cortex M0+ and on the same scale it is the size of the little black dot that she is holding up in the photo. In fact, that is in 20nm so it is actually way too large—it could be in 5nm and a lot smaller still. But it is still a dramatic illustration of Moore's Law over nearly 40 years.

6502The first microprocessor that Sophie was involved with was the 6502 (always pronounced sixty-five-oh-two). This was the chip that powered both the Apple II and the BBC Micro. It had 4K transistors. It ran at 1MHz in the Apple II and at 2MHz in the BBC Micro. It was all 8-bit so doing 16-bit arithmetic required multiple instructions (and address arithmetic was complicated). It was 21mm2 in 6um. It was partially pipelined, in that it overlapped instruction fetch with operating on the previous instruction, so it was quicker than many of its contemporaries. You can tell from the die shot that it was handcrafted. Sophie talked about it being designed in the rubilith era. I would have guessed the Calma GDS II era, but GDS II was introduced in 1978 and the 6502 in 1975 (so probably designed in 1974). There was a GDS (what we might call GDS I) system introduced in 1971, but I know almost nothing about it.

arm1 dieThe ARM1 was 25K transistors, designed in 1985. You can see it is implemented in a very regular way, although still laid out by hand. This is what I would call Mead & Conway style, the way design was being done once VLSI design escaped into computer science departments and research labs like PARC. In that era, synthesis didn't exist, and place and route (P&R) was not good enough for microprocessor design. It was 37mm2 in 3um. It ran at 8MHz in the Acorn/BBC A310 (which was not the BBC Micro, the A310 was not a success). It was pipelined, so effectively one instruction per cycle (since fetch, operate, and writeback were overlapped).

firepathThe next processor Sophie showed as Firepath. This was 6 million transistors. They are too small for optical imaging so the die photo is pretty fuzzy. You can see that the layout is no longer regular since it is handled by synthesis and P&R. It ran at 330MHz, was 7mm2 in 0.13um (or 130nm as we would say today).

More transistors mean more processors. Sophie showed a DSL central office chip with two processors. Then four processors. "Nobody really even knows how many processors are in a phone today".

Amdahl's Law

The second law really is a law...one that can't be broken. It is Amdahl's Law that says that the speedup of a single program implemented on lots of parallel processors is limited by the sequential part.

If we have an absurdly parallelizable problem like ray tracing that is, say, 95% parallel and 5% sequential. Then speedup is limited to 20X, even if you have a million processors.

The graph above shows (top line) the performance approaching 20X. It is actually worse than it looks at first glance since the horizontal scale is logarithmic (so it doubles at each tick on the axis) but vertically it is linear. It doesn't actually make it to a million processors, but it does make it to 60K (we 65,536 as all computer scientists know).

More realistic values, like 75% being parallelizable in a web-browser say (fetching images and rendering in parallel) then the maximum speedup is 4X (and it takes 64 processors to get that). A lot of programs are more like 50% parallelizable, so the maximum speedup is 2X and it can't really make use of more than about 4-8 processors max.

Part Two

The second part of Sophie's presentation will be in my post tomorrow.

The lecture was recorded and you should be able to view it.

 

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.