The History of PCIe: Getting to Version 6

15 Mar 2021 • 6 minute read

PCIe, or Peripheral Component Interconnect Express which nobody ever says, was an upgrade to the earlier PCI bus. This was developed by Intel and introduced in 1992. It replaced several older, slower buses that had been used in a somewhat ad-hoc fashion in early PCs. It was primarily a 32-bit bus, although the standard allowed for 64-bit. Most importantly, it was a parallel bus. Today, it is only of historical interest and is no longer in active use, so I won't go into any detail.

In 2004, a group of Intel engineers formed the Arapaho working group and started to develop a new standard. Eventually, other companies joined the group. The design went through several names before settling on PCI Express (PCIe). In some ways, it is a successor of PCI, in that it serves a similar function. In other ways, it is a completely different type of design. In particular, it is a serial bus, more like a network-on-a-board than the old-style parallel interface of PCI (and pretty much all other buses of that era).

The initial standard, PCIe 1.0a, had a data rate of 250MB/s per lane giving an aggregate rate of 2.5GT/s (gigatransfers per second). Like other serial buses, the performance is usually measured in transfers per second to avoid counting overhead bits as "data". PCIe 1.0a used an 8b/10b encoding scheme, so only 80% of the bits transmitted were true "data". The overhead bits serve two main functions. First, they ensure that there are always enough clock transitions for the serial interface to recover the clock. And second, they ensure that there is no net DC current.

Subsequently, there have been regular upgrades to the standard, with higher transfer rates. Since the main use of PCIe was on Intel-processor-based PCs and servers, in practice, the new standards became effective once Intel released processors that used it. The general philosophy of the standard's evolution has been to pick transfer rates that are manufacturable in the mainstream process nodes of the era. However, due to its ubiquity, PCIe is used in most designs requiring a high-performance peripheral bus, no matter what the underlying architecture. For example, PCIe requirements are specified in the Arm Server Base System Architecture Specification.

PCIe 2.0, introduced in 2007, doubled the transfer rate but kept the same coding scheme.

PCIe 3.0, introduced in 2010, switched to a much more efficient 128b/130b coding scheme and added scrambling with a known binary polynomial to get a good balance of 0s and 1s for clock recovery and no DC bias. It also increased the transfer rate a lot. A 16-lane PCIe 3.0 interface could transfer 15.7 GB/s. In practice, though, if a design needs that sort of bandwidth then it is easier to step up to PCIe 4.0. Today, PCIe 3.0 is the most widely deployed version of PCIe in shipping devices. Just to take one example, the Google TPU version 3 uses PCIe 3.0, and the current USB4 standard is based on PCIe 3.0. It might seem surprising, but it takes nearly a decade for a PCI generation to go from ratification of the standard to becoming mainstream. It is a bit like the credit card problem: merchants don't want to bother accepting credit cards until lots of people have them, and people don't want credit cards until lots of merchants accept them.

PCIe 4.0 kept the same 128b/130b coding scheme but doubled the transfer rate again to 16GT/s. Cadence has compliant IP for PCIe 4.0. See my posts "Interoperability is the Only Way to Prove Standards Compliance" from 2016, and PCIe Gen 4: It's Official, We're Compliant from 2019. Another important aspect of PCIe is that other protocols are built on the basic transfer mechanism and PHYs. For example, see my post 16Gbps SerDes Multiprotocol Multilink PHY IP. Or for a specific example, see my post CCIX Is Pronounced C6. CXL also piggybacks on PCIe. As you might surmise from all these posts, PCIe 4.0 is very much the mainstream for current designs. Intel's Tiger Lake mobile processors support it, as does AMD's Zen2 CPU family. This makes it attractive for any sort of peripheral chips such as SSD controllers or networking, and then the knock-on effect makes it attractive for other non-x86 systems.

Design-ins for PCIe 5.0 have already started (standard was approved in May 2019) with a 32GT/s performance. There is also interest in PCIe 6.0 (final standard not yet approved) with 64GT/s performance, and switching to PAM4 signaling, with four voltage levels and thus two bits per clock cycle. Cadence already has extensive experience with PAM4 signaling since it used in our 112G SerDes. See my posts The World's First Working 7nm 112G Long Reach SerDes Silicon and Signal Integrity for 112G.

Since it is the mainstream from a design and IP point of view, the rest of this post will focus on PCIe versions 4.0 and 5.0 (and a bit on the future version 6.0).

PCIe Version 4.0, 5.0, and 6.0

PCIe 5 .0is seeing accelerated adoption as more and more systems are upgraded and more products are made available in the market. Having said that, PCIe 3.0 and 4.0 are still the currently most mature PCIe interfaces widely deployed in a huge number of applications as the main interconnection for various types of I/O use cases. As I said above, PCIe 6.0 is waiting in the wings, and is seeing a lot of interest already.

Obviously, at some level, each generation of PCIe has higher performance, but it is not just a nice-to-have number on a datasheet. It enables more powerful applications.

For Ethernet, PCIe 4.0 can be used for 100G and 200G. PCIe 5.0 takes that up to 400G, available today. And in the future, PCIe 6.0 will take that up to 800G.
For solid-state disks (SSDs), PCIe 4.0 enables transfer rates up to about 7000MB/s, PCIe 5.0 takes that up to about 14GB/s and PCIe 6.0 should take that up further to 28GB/s.
Artificial intelligence (AI) and machine learning (ML) transfer massive amounts of data and the PCIe interface is the bottleneck. This is the case in pretty much all applications, such as autonomous vehicles, medical imaging, genome sequencing, data mining, and more. The bottleneck is PCIe, whether the training/inference is implemented on a CPU, GPU, FPGA, or an ASIC/SoC such as Google's TPU.
Storage-class memory requires the high performance of PCIe 5.0 and PCIe 6.0. See my post Persistent Memory: We Have Cleared the Tower for more details on this from last ears Persistent Memory Summit.
In automotive, current ADAS (advanced driver assistance systems) are using PCIe 4.0, but future autonomous driving will require higher performance to handle all the cameras, radar, and lidar.
The hyperscale data centers used for cloud computing by companies like AWS, Microsoft Azure, and Google Cloud can make use of all the bandwidth that they can get, especially to implement connections between the primary CPU (Intel, AMD, or Arm) and accelerators such as NVIDIA GPUs or Xilinx/Intel FPGAs.

Learn More

See the product page for PCI Express IP, with details of Cadence Controller and PHY IP for PCIe supporting HPC, cloud, AI/ML, storage, mobile, and automotive applications.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.