Get email delivery of the Cadence blog featured here
The Linley Processor Conference always opens with a keynote by Linley Gwenapp giving an overview of processors in whatever is the hottest area. Most of the other presentations during the conference tend to be in the hottest area, so it sets a context into which the other presentations slot. In the distant past, the focus was on data center processors. Then the most important processors in the world were all mobile. But that market largely went away once it became clear that the mobile players at the high end were going to build their own application processors, and Qualcomm and Mediatek had most of merchant market. Now, as someone said as an aside in a question, "this is the AI processor conference".
Indeed. Linley's keynote was titled The Next Generation of AI Processors.
Of course, like everyone else, Linley's presentation came from...his living room. He started off with the obvious trend that models are getting bigger, since larger models usually produce more accurate results. Image processing models are growing around 2X per year, and language processing models around 10X per year. This is definitely true in the academic world, where all that counts is accuracy and cost is ignored completely. But in the real world, where the focus is on cost-benefit and not just benefit, I'm not sure how true that is. A company might be able to make their natural language recognition 0.1% better with a model 100 times as large, but that is probably not the tradeoff they would make.
One challenge in the deep-learning world is how to assess "goodness" of a solution. There is the MLPerf organization that orchestrates standard benchmarks. But usually people take Resnet-50 or Alexnet (image recognition algorithms) and rate how many images per second it can process (pretty much any processor can do more than 10,000 images per second on Resnet-50). This is better than just using a measure like "TOPS" since a lot depends on the size of the MACS. It is a lot easier to get a TOP of 8-bits than Bfloat16, which is easier in turn than FP32.
Linley sees a lot more processors than I do (probably than anyone does -- it's his day job) so his perspective that a wide range of core-sizes are feasible is important. Small cores are easier to replicate but complicate both the interconnect and the software compilation to get the model mapped into the system. The cores range all the way from Cerebras at the high end (who presented at the conference and who I wrote about last year in HOT CHIPS: The Biggest Chip in the World) who have over 400,000 cores to Groq (who also presented) who have one huge core. Everyone else is in between, mostly with under a hundred, and a few closer to one thousand cores.
Most neural networks are based on MACs (multiply-accumulate units) often doing the multiply at one precision and the accumulate (add) at higher precision so that they don't either overflow (fixed-point) or lose precision (floating-point). But that is not the only way to implement a neural network. You can also use:
Linley moved on to look at the suppliers to the data center market. NVIDIA is the king of the hill, with the 2017 Volta V100 the best-selling GPU for the purpose, although the 2018 Turing T4 has overtaken the V100 in units (but not revenue). The next-generation Ampere product remains unannounced, although it is expected to deliver a big boost for AI.
But competition is coming in the training (data center) area:
On the inference side, as Linley put it, "vendors come and go":
At to the cloud vendors themselves, they have internal projects, too (which always raises the question as to who the merchant semiconductor companies will sell to):
Most new cars in US/EU/etc have level 2 ADAS features such as AEB (automatic emergency braking), LKA (lane keep assist), and ACC (adaptive cruise control). Most automakers plan to skip from level 2 to 4 due to handoff problems at level 3. Initial models will be geofenced, only for daylight and good weather. Many now consider level 5 (fully autonomous, no controls) to be unattainable. The road to autonomy is taking longer and the initial focus is on commercial vehicles where the economic aspects are clearer.
Moving on to data centers on wheels, also known as automotive, the battle in the merchant space is between Intel (Mobileye) and NVIDIA for now, with Mobileye having the lowest power and lowest cost, but NVIDIA having the highest performance. In terms of general progress, GM customers have driven over 5M miles in level 3 "super cruise" mode. Tesla plans to deliver "full self-driving" this year as an OTA upgrade. Waymo has performed thousands of driverless rides in Phoenix, some with and some without safety drivers. Intel Mobileye continues to be committed to robotaxi service in Jerusalem by 2022. GM commits to producing Cruise by 2022, a level 4 robotaxi with no steering wheel or controls. But Linley is more cautious about timing and says "we expect many luxury brands to offer some level 4 consumer vehicles by 2025".
AI accelerators are now standard in mid-premium and premium smartphones:
But to me, it remains a little unclear exactly what they are doing (beyond face recognition to unlock, etc).
Sign up for Sunday Brunch, the weekly Breakfast Bytes email.