Never miss a story from Breakfast Bytes. Subscribe for in-depth analysis and articles.
At the recent HOT CHIPS, the first day opened with the chips that you first think of when you hear the word processor. These are the next generation of chips from the likes of Intel, AMD, and IBM. There were lots of other chips too, such as Arm's Neoverse N2, and NVIDIA's new data-processing unit (DPU), or AMD's next-generation graphics architecture. But for this post, anyway, I'm going to focus on these general-purpose processors from the big guys in the industry that power most of the laptops, servers, and gaming consoles.
I am not going to try and go into detail on all these architectures, just point out the things that leaped out to me during the presentations. For all the Intel processors, quite a lot of the same ground was covered during Intel's Technology Day that was held the week before. You can freely watch it on YouTube (you don't need to have been registered for HOT CHIPS). Warning, it's over two hours long.
Alder Lake is an all-new CPU core designed to be scalable from 9W to 125W. It is built on the Intel 7 process (remember 7 is the new 10). It has two types of core, P-core for best single-thread performance (P is for performance) and E-core for throughput (E is for efficiency). They both have a deeper front-end (instruction processing before the execution units) and a deeper back-end (pipeline stages, etc).
Of course, like all designs with two different core sizes, the idea is that a heterogeneous workload can be processed faster and more efficiently by running the time-critical stuff on the big cores and using smaller cores for things that don't need to burn that much power or area to get the job done. More about what Intel is doing in this space below. The diagram above shows that P-cores run over 50% faster than E-cores, but that 2 P-cores and 8 E-cores has over 50% higher throughput than using four P-cores.
The diagram above shows the scalability. Note that in the processor world "mobile" means laptop not smartphone. In terms of interfaces, everything is up to date with support for PCIe 5.0, DDR5, and LP5. With, of course, support for older protocols like PCIe 4.0 or DDR4.
The different chips are built out of different building blocks mixed and matched, as you can see above. The same binary image for the operating system (and any other software) runs unchanged on all the cores.
The biggest new thing is Intel Thread Director. This provides hardware support for keeping track of which threads should run on P-cores, and which on E-cores, and which should be moved from one core to another. It monitors the runtime instruction mix of each thread as well as the state of each core (including thermal) with nanosecond precision. It doesn't do the thread scheduling in hardware, but the runtime feedback is used by the OS to make scheduling decisions. Over half the presentation was on details of how Thread Director is implemented, and what tables it maintains.
You can see a video demo of Thread Director on YouTube:
AMD discussed the new Zen 3 core. This is the latest progression in their Zen roadmap (I'm starting to sound like a guru). The original Zen and Zen+ were in 14/12nm. Then Zen 2 was in 7nm. Zen 3 (the one being announced at HOT CHIPS) is still in 7nm, but Zen 4 will be in 5nm.
Above are the specs.
I continue to be amazed at how big of an increase in instructions-per-cycle (IPC) is still possible. It would seem nothing would make that much difference anymore. But Zen 3 has a 19% IPC uplift over Zen 2. The presentation went into detail about what factors caused this increase, as you can see in the above bar graph.
Here are the major changes between Zen 2 and Zen 3.
They discussed a future V-cache technology with TSVs and direct copper-to-copper bonding.
So that was the core. It is actually used to deliver a range of products, some using 3D packaging technology.
The IBM Z processors are the descendants of the original IBM 360 family. I've read that code from the 1960s will run unchanged on them, probably through some form of virtualization. Certainly, a lot of code that is run on Z mainframes is written in Cobol, a language that dates back to the 1950s. The new processor being announced at HOT CHIPS is called the IBM Telum.
You have to have a die-shot at HOT CHIPS, and so here is IBM's. The primary differences from its predecessor are down the left-hand side. I don't have space to go into all the details that were elaborated. The chip is built in Samsung 7nm technology, it is 530 mm2, with 22.5B transistors, and an over 5GHz clock rate.
One of the things that IBM has always excelled at is finding innovative ways to deliver their processors so that they are reliable, serviceable, and very high performance. So a single chip can also be a multi-chip module with two chips, or a drawer with four chips, or a bigger system with four of those drawers.
Another new feature is an integrated AI accelerator. At HOT CHIPS, IBM went into quite a lot of detail on this...but increasingly hearing about AI accelerators, they all seem much the same—lots of TOPS, lots of bandwidth, handle sparsity.
Sapphire Rapids is the next-generation Intel data center chip (Xeon Scalable Processor in Intel-speak).
The first big thing about it is that it is a multi-tile design (tile is what other people call die or chiplets). However, it is monolithic at the architecture level and each tile has access to the resources of all the other tiles.
Here's the block diagram. Of course like all processors it is built for AI, too. Instead of having some sort of co-processor, Intel says that this has full Intel architecture programmability.
Later in the week, I'll give my opinion on what trends we can discern from these presentations, and others at HOT CHIPS (including the tutorials on the Sunday).
Sign up for Sunday Brunch, the weekly Breakfast Bytes email.