Get email delivery of the Cadence blog featured here
The last day of Arm TechCon opened with the return to the event of Charlie Miller. I already covered that in my post Charlie Miller: Stopping Cars Being Hacked Instead of Hacking Them. For my first post about Arm TechCon this year, see my post Arm TechCon: The Keynotes.
Peter came on to talk about what he called the "new 2020 architecture". This is next year's processor. He didn't use the name, but I'm assuming that this is what Ian Smythe had referred to as Matterhorn in his keynote earlier in the week. Presumably, it will eventually be an Arm Cortex-A<big-number-here> but you never know with Arm's processor numbering. Peter did warn us that we'd have to wait until next year for all the details. He told us to expect the obvious: a new branch predictor, more execution units, a bigger re-order buffer, more retirements. Arm, like every other processor vendor, was blindsided by Spectre and Meltdown a couple of years ago. They have had to modify large parts of the CPU to address this, adding pointer authentication and branch target identifiers. We already know that this processor (assuming it is Matterhorn) will have a fast matrix-multiply instruction, and support for the "brain float" format bFloat16.
Now that I am writing this presentation up as a blog post, I realize that it is a little unclear which things are in their 2020 product line, and which things are current. In other words, what is available now, and what is on their roadmap.
He divided the solution space that Arm processors are aimed into three:
Mobile (client) is all about power and flexibility. It is 10 years since the initial introduction of big.LITTLE, which had very simple switching models between the CPUs. Now there will be more big cores, more little cores ("we will talk more about that next year"). There is a new fourth-generation display processor (DPU). Integrated GPU. Integrated NPU (neural network). There is a big focus on security. App store cracking costs the industry over $100M. Memory tagging is implemented with performance and efficiency in mind across Arm's IP. Security is improved by layers. Secure EL2 for hypervisors (to be introduced in 2020) from CPU to main memory. This creates strength in both depth and breadth.
Infrastructure is all about scale-out. Cache coherency, wider vector performance, transactional memory. It pushes many-core performance, acceleration, and scale-out. Direct-connect to optimize latency and cache in many-core configurations. System MMU scales to millions of translations per second in 64+ core solutions. Infrastructure includes translation as a service. Peter said that all this complexity means that the design cost of the memory management unit is now the same as a small Cortex-A class processor.
Automotive is a hybrid of mobile and HPC but with functional safety. There is dual-core lockset across the whole processor roadmap (Cortex-A, Cortex-M, Cortex-R). It retains scale-out but adds more features. Compute power is re-balanced towards the NPU for inference. It charts new waters in heterogeneity. There is a safety island with ASIL-D hardware using Cortex-R52. The focus is on frame processing time, which is critical. The data flow from sensor, to ISP, accelerators, CPU, and NPU has to be optimized. There is constant interaction between the CPU and NPU.
The approach Arm is taking for all of these is to integrate everything onto a single chip, not crossing back and forth between multiple physical boards (so no need to create lots of PCIe drivers).
One change that was announced by Simon Segars in his keynote (see my earlier blog post linked above) are custom instructions, which augment the datapath with hooks to add your own data-processing instructions. Partners can thus innovate and differentiate within the Arm ecosystem. Initial implementations of this are focused on Cortex-M and Cortex-R. Cortex-A custom instructions will come later. This capability is perfect for non-user-visible software since you can add 1, 10, or 20 more instructions and get a lot of additional performance. But it doesn't make sense in user-visible software such as apps on a mobile phone, since it would create fragmentation (and, for the time being, Cortex-A doesn't have custom instructions anyway).
I've loved going to Greg Yeric's keynotes ever since I saw his keynote about the end of Moore's Law at IEDM in Washington DC soon after I rejoined Cadence in 2015. He criticized the audience for their focus on the typical performance of their devices, and ignoring the spread from worst to best case (focus on the mean and ignore the standard deviation). Designers don't care about the typical performance, he said, since their designs have to work at the worst corner, too. I think it went over the heads of most of the audience who are device physicists (mainly) who have very little idea how real chips are actually designed since they are focused on designing an individual perfect device. You can read more about what he said back then in my post Moore's Law at 50: Are We Planning for Retirement?
Greg gave the final keynote at Arm TechCon this year on The ICs of 2030. He just got 20 minutes, and covered a lot in that time. Here's one of the most memorable images:
Nope, that's not Moore's Law. The Y-axis is in dollars. That is the cost of a stepper. A current-model EUV stepper is around $120M. Next-generation high-NA EUV steppers are expected to be more like $250M. Extend the curve out to 2030 and we'll be at the billion-dollar-stepper.
I would write a whole blog post on Greg's presentation. But as it happens, I don't have to do so because he did it himself. His blog post contains a lot more than was in the presentation as given. To tempt you to go and read it, here is his table of contents (I said it contained a lot):
If you are the kind of person (like me) who likes to put things in your diary a year in advance, Arm TechCon next year will be October 6 to 8, 2020.
Sign up for Sunday Brunch, the weekly Breakfast Bytes email.