Get email delivery of the Cadence blog featured here
At the recent HOT CHIPS, the first day was dedicated to general-purpose processors, and the second day to special-purpose processors. Today, I'm going to take a look at some of the processors presented on the first day.
If we look at server chips, there are basically a handful of suppliers:
As it happened, AWS/Annapurna presented one of the keynotes at the recent CadenceLIVE Americas and they presented at last year's HOT CHIPS, too. See my posts Climbing Annapurna to the Clouds and HOT CHIPS: The AWS Nitro Project. AMD presented at last years HOT CHIPS (including Lisa Su giving a keynote). Everyone else on the above list presented their latest processor at HOT CHIPS 2020. There are probably others trying to get into the market, but most of the action for smaller players is in domain-specific processors, especially for deep-learning training. I'll cover that segment of the market in a different post. One notable relatively recent exit from the general-purpose server market was Oracle/Sun and the end of SPARC.
What the processor industry calls mobile chips are chips for laptops. In the rest of the semiconductor industry, when we say mobile, we mean smartphones. Chips that actually go into smartphones are usually called application processors or APs. There are really just two suppliers to the mobile processor market:
You've probably heard that Apple/Arm will be joining the list before the end of the year, with Arm-based chips for Macbooks, but haven't done so yet. See my post How Do You Run One Architecture on Another? for a bit more on that. Both Intel and AMD presented their latest mobile processors at HOT CHIPS 2020.
There are big differences between server and mobile processor chips. Servers have a much bigger power budget, and mobile processors incorporate a GPU. That leads to more cores for servers, a different performance/power point and so on. This post is a half-day of presentations compressed into a single blog post. So it is not going to be comprehensive. Usually at HOT CHIPS, there is a theme you pick up from the presentations. Last year it was sophisticated 3D packaging. This year, that was still around but more muted. Here are a couple of things I don't remember seeing before:
First, several of these processors support fully encrypted memory (DRAM) with full hardware support from the processor so that there is no latency or bandwidth overhead (a little bit of power, presumably).
Various forms of what I think of as next-generation power-management mixed with thermal sensing. For example, on a multi-core processor, if you need some cores to run extra-fast you can increase their clock rate. That will cause thermal issues and so the other cores will automatically get throttled. So in a sense, compute power has been transferred from low-priority cores to higher-priority cores. Another thing I don't remember seeing is continuously variable clocking during frequency changes. Usually, frequency changes (DVFS for dynamic voltage and frequency scaling) were more like a car gearbox: there are just a few gears and there is a whole procedure for changing gear. Often queues would all need to be flushed, the processors would need to be stopped and restarted, that sort of thing. There have been various attempts over the years to create continuously variable gears for cars but none caught on, although I guess we got there by using electric motors instead. And by the way, agricultural equipment has had some form of this for decades. On a combine-harvester, you have to run the engine at full power to thresh the grain, but you need to control the speed at which the vehicle advances depending on how thick the crop is in front of the cutter-bar.
So, without further ado, this year's menagerie of processors. I'll give a limited amount of commentary, usually a die-shot, and then finish with their own summary slide that I presume is the way they want their processor summarized.
Ice Lake is built in 10nm+. The + is important, and that it is not ++, but I've lost track of Intel 10nm process variants...as apparently have Intel since they decided a new name was needed at their recent Intel Architecture Day where they announced a 10SF version (SF is for superfin), apparently joking that there were so many "+" versions that a new naming convention was required. Ice Lake uses the Sunny Cove core and the 2-socket Whitley Platform.
Here's the obligatory die-shot. The Sunny Cove core (28 of them in the die shot) has a wider front end and an improved branch predictor leading to an 18% increase in IPC (instructions per cycle). I continue to be amazed that at this point in processor design evolution that such big increases are still possible. it has a whole lot of new instructions, most notably for cryptography support, leading to some cryptographic algorithms running 8X as fast. The infrastructure of the chip supports fast, continuous change of frequency and reduces the frequency change blockage from 12us to nothing.
I talked about advanced power/thermal management. Here's the relevant slide for this processor.
Here's a summary of the capabilities and the die-shot. The chip is fabricated by Samsung Foundry. Overall it has 2.6X the performance per watt of its predecessor, the Power9.
The chip is designed to go into systems that can scale up to full supercomputers. Summit, until recently the #1 supercomputer, was built using the previous version of this processor. The PowerAXON terabyte-per-second interface allows one processor to map another processor's memory, too. The big advantage of this feature is that it means each processor can be configured for a normal amount of memory, rather than the most that might ever be required, since it can be reconfigured to a bigger memory system by linking two systems together.
The ThunderX3 is the successor to...surprise...the ThunderX2. The table above shows the comparison (and as near as a die-shot as you are going to get for this processor). It is built in TSMC 7nm. One interesting feature is that it has four hardware threads per core, and each thread looks like a full Arm CPU to the software, so there are, as Marvell put it, "four CPUs per core". The area impact of this is 5%. The ThunderX3 has 60 cores, so 240 threads (or 240 CPUs as the software sees it).
The z-series of servers from IBM are the direct descendants of the IBM 360 in the 1960s and the IBM 370 in the 1970s. When I was an undergraduate computer-scientist in the 1970s, the main computer for the entire university was an IBM 370/165. I've heard repeatedly in IBM presentations that the programs that I wrote nearly 50 years ago would still run on the latest z-series server unchanged.
As it happens, running old programs unchanged is a big part of the appeal of the z-series. I've heard that COBOL programmers are hugely in demand. As Anthony, the presenter, pointed out, there are 220B lines of Cobol and 70% of the world's business transactions take place in COBOL. COBOL even pre-dates the IBM 360 since it was first released in 1959. Another fun fact, every second there are 1.3 million CICS transactions. That's the transaction management system. By contrast, there are "only" 68K Google searches per second globally. So "ancient" mainframes running "ancient" programming languages remain extremely important.
So here's the chip-shot. Well, it's a big iron mainframe so you actually get a drawer-shot.
Oh wait, you get a chip-shot after all. On the final conclusion slide (actually there were more earlier).
Renoir is 7.8B transistors (small compared to AMD server processors) built in 7nm, about 12.5mm on a side. The previous generation was Picasso, and this has twice the transistors and a 25% smaller die. It contains the latest version of the Vega GPU at a 225% increase of GPU performance per square mm (some from improved architecture, some from moving to a new process node).
I mentioned advanced power/thermal management earlier. Here's AMD's slide on the topic. And finally the summary slide:
Here's the die-shot.
Tiger Lake is built in the very latest version of Intel's 10nm process known as 10SF (for SuperFin). It also has a more advanced metal stack. Intel doesn't invite me to their technology day (I don't count as press in their eyes) so I didn't get the detailed presentation that they did recently. They are all available on video but with CadenceLIVE and HOT CHIPS, I've not found time to watch yet.
The chip contains Intel's new Xe graphics engine, which seems to be much higher performance within the same power envelope. The advanced power management can move power between core, communication fabric, and memory subsystem depending on where it is needed. The above image is the end of an animation of many images showing how it adapts over time.
HOT CHIPS had record attendance this year, since people could come from all over the world. There were over 2300 attendees (the slide above was out of date the moment it was made since people kept signing up). The website doesn't make it clear, but I believe you can still register, even though the conference is over, and get access to the videos and presentations. Around the end of the year, they will be opened up and anyone will be able to see them.
Sign up for Sunday Brunch, the weekly Breakfast Bytes email.