• Skip to main content
  • Skip to search
  • Skip to footer
Cadence Home
  • This search text may be transcribed, used, stored, or accessed by our third-party service providers per our Cookie Policy and Privacy Policy.

  1. Blogs
  2. Breakfast Bytes
  3. Numbers Everyone Should Know
Paul McLellan
Paul McLellan

Community Member

Blog Activity
Options
  • Subscribe by email
  • More
  • Cancel
cpu speed
cache
networking

Numbers Everyone Should Know

6 Sep 2018 • 5 minute read

 breakfast bytes logoAt the recent HOT CHIPS, Paul Turner of Google Project Zero talked about numbers everyone should know. These numbers, actually latencies, seem originally to come from Peter Norvig but have been updated by a number of people since his original table, since processors have got faster (but most other things have not).

One reason that the network delays have not changed much is one number that every software engineer (and semiconductor designer) should know: light travels a foot in a nanosecond.

As Grace Hopper said (in the video I put at the end of this post), "there are a lot of nanoseconds between earth and a geostationary satellite."

When I was an undergraduate, the head of the Cambridge Computer Laboratory was Maurice Wilkes, who had worked on digital computers since EDSAC, one of the first programmable digital computers, turned on in 1949. In a seminar, I remember someone challenging him that computers could not get any faster due to these speed of light considerations. In those days, a mainframe CPU might be in one big cabinet and the main memory in another big cabinet on the other side of the computer room (I remember when they added the...gasp...second megabyte of memory to the university time-sharing-service mainframe). Anyway, Wilkes thought for a moment before saying, "I think computers are going to get a lot smaller." Which, of course, they did with the invention of the microprocessor. With a 3GHz clock, light travels less than 4" per clock cycle.

dramThe Numbers

  • CPU cycle 0.3ns
  • L1 cache reference 0.5 ns
  • Branch mispredict 5 ns
  • L2 cache reference 3 ns
  • L3 cache reference 28 ns
  • Main memory reference (DRAM) 100 ns
  • Send 2K bytes over 1 Gbps network 20,000 ns
  • Read 1 MB sequentially from memory 250,000 ns
  • Round trip within same datacenter 500,000 ns
  • Disk seek 10,000,000 ns
  • Read 1 MB sequentially from network 10,000,000 ns
  • Read 1 MB sequentially from disk 30,000,000 ns
  • Send packet CA->Europe->CA 150,000,000 ns

If you only remember two of these, pick the fact that a memory access to cache is 0.5ns, but to DRAM is 100ns. That's 200 times as long. A huge fraction of any modern microprocessor is doing its best to hide that inconvenient fact.

What If a Clock-Cycle Was a Second

The problem with numbers like that is that they don't mean anything, even to people who deal with them every day. Billions. Picoseconds. Nanometers. Ångstrom units. Gigabytes. Zettabytes. I deal with these units all the time but don't have any intuitive feel for them.

When I taught a course on computer networking at Edinburgh University, one thing I liked to do was to get people to work out how long various things took if a computer clock-cycle was one second. This wasn't an original idea, I think I had been given the exercise when I was an undergraduate.  In the days when I was teaching, computers like a VAX 11/780 were roughly 1 MIPS, so this was actually a slow down of a million times. Today, the slow down is much greater. Computer networks in that era ran at 56kbps or 64kpbs, so we could use our imaginary one-second-per-clock computer to see just how slow that was even to the computers of that era.

aws snowmobileAnother interesting exercise was to work out the bandwidth of a truck full of magnetic tapes (these days you can use SD cards) driving at 60mph on the freeway. Now you know why Amazon has Snowmobile for petabyte-sized data transfers to AWS. It's a container of SSD drives that is moved on a 18-wheel truck (see the pic to the right).

Some of the ratios really bring home just how big the mismatches are. The CPU runs at one cycle per second, and that's how long a register-to-register operation takes (and modern processors can do several of them at the same time). An operation involving memory in the L1 cache (such loading a value from memory into a register) just takes two seconds, twice as long. But going out to DRAM takes 7 minutes. That is a huge difference that computer architects have largely hidden with multi-level caches, out-of-order-execution, and multiple execution units.

System Event Actual Latency Scaled Latency
One CPU cycle 0.3 ns 1 second
Level 1 cache access 0.5 ns 2 seconds
Level 2 cache access 2.8 ns 10 seconds
Level 3 cache access 28 ns 2 minutes
Main memory access (DDR DIMM) 100 ns 7 minutes
SSD I/O 50–150 μs 1.5–4 days
Rotational disk I/O 1–10 ms 1–9 months
Internet packet: San Francisco to Europe and back 150 ms ~10 years
Time to type a word 1 second ~1 century
Time to open Powerpoint on my Mac ~10 seconds ~1 millennium

To put the amount of computer power that is wasted into perspective, NASA had a total of 1 MIPS to go to the moon. It takes 10 days worth of all NASA's computers to open PowerPoint.

Three Numbers in Computer Science

 There is a well-known aphorism that there are only 3 numbers in computer science. I have tried in the past to track down who first came up with this, but it seems to be lost in the fog of time.

The three numbers are 0, 1, and ∞ (infinity). The reasoning for this is that there should be no impossible things, if there can truly only be one of something then there should only be one, and if there should be any other number then you should assume it might be arbitrarily large.

Actually, in my experience, often when there is only 1 of something, you should opt for the infinite case anyway. Some examples:

  • When designing a chip, there is only one chip, and thus only one process technology. But now we have 3D packaging and chiplets and that is no longer true.
  • There is only one processor in a microprocessor...but now we have multicore, and offload processors, and supercomputers.
  • In the early days of chip design, there was only one power supply, and it didn't even appear in the netlists. It took a lot of messy specification to handle multiple power supplies in CPF and UPF, including level shifters and retention gates that didn't appear in the netlist either. It might have been a lot cleaner to assume there was more than one power supply in the first place.

Oh, and here's a fun fact. There are only three numbers in computer science...but three is not one of them.

Grace Hopper Explains Nanoseconds

 

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.