• Skip to main content
  • Skip to search
  • Skip to footer
Cadence Home
  • This search text may be transcribed, used, stored, or accessed by our third-party service providers per our Cookie Policy and Privacy Policy.

  1. Blogs
  2. Breakfast Bytes
  3. Linley Gwennap's Deep Dive into Deep Learning
Paul McLellan
Paul McLellan

Community Member

Blog Activity
Options
  • Subscribe by email
  • More
  • Cancel
deep learning
Linley

Linley Gwennap's Deep Dive into Deep Learning

1 May 2019 • 4 minute read

 breakfast bytes logoAt the recent Linley Spring Microprocessor Conference, Linley Gwennap kicked off with the opening keynote on what is clearly the biggest thing to hit processors in a long time: deep learning.

Linley started with an overview of deep learning and the latest trends. I'm going to skip that since I've covered the basics in many posts already. Start here if you need to get up to speed:

  • HOT CHIPS Tutorial: On-Device Inference
  • Bagels and Brains: SEMI's Artificial Intelligence Breakfast
  • Deep Learning and the Cloud
  • HOT CHIPS: Some HOT Deep Learning Processors
  • Inside Google's TPU

Even if you don't have time to read these, a quick summary would be that:

  • For training, it is mostly 32-bit floating point AFP32 (some FP16), but there is also an up-and-coming Bfloat16 that has the exponent range of FP32, namely 8 bits, and just 7 bits of mantissa, so less accuracy but greater range than regular FP16
  • For inference, especially on the edge, it is mostly 8-bit multiplies with 32-bit accumulates, integer arithmetic rather than floating point

Deep Learning in the Datacenter

 You probably already know that NVIDIA and Intel dominate in the datacenter. Intel has added AI-boost instructions to the new Cascade Lake that triples performance (at least for things like 8-bit arithmetic). Intel presented that processor at the conference, and I'll cover that in another post. Or, if you want a preview, Intel gave some details about it at HOT CHIPS last summer, which I covered in Intel's Cascade Lake: Deep Learning, Spectre/Meltdown, Storage Class Memory).

Today, the most popular training option is the NVIDIA V100 Volta. This is a GPU with tensor cores delivering 125 Tflop/s (FP16) at 300W. The NVIDIA T4 provides a PCIe accelerator for AI inference, with integer-optimized cores, and an estimated 80 TOPS of 8-bit arithmetic at 70W.

But new challengers are emerging. 

  • Habana is in production with Goya accelerator for inference, with leadership performance on ResNet-50
  • Graphcore offers GC2 chip with 1,216 cores with FP16 capability
  • Xilinx offers Alveo accelerator with preprogrammed FPGA, beats NVIDIA T4 for inference on small batch sizes
  • Huawei offers Ascend MAX for shipment mid-2019
  • Wave has developed dataflow processor with 16,384 cores, available either as systems or licensable IP

In addition, cloud providers have their own:

  • Google deployed TPUv1 (inference only), and then v2 and v3 (training and inference)
  • Microsoft uses FPGA-based Brainwave for some inference
  • Annapurna (acquired by Amazon) designed Inferentia ASIC for deployment in AWS in 2H19
  • Alibaba and Baidu have developed FPGA-based accelerators

Inference at the Edge

 The general trend is to move AI to the edge. Today, products like Alexa and Siri send a compressed voice recording up to the cloud to do voice recognition and natural language processing. But, especially in aggregate, there is a huge amount of processing power at the edge and it makes sense to use that rather than have to add more cloud capability. It also reduces latency, works while offline, and is better from a privacy point of view.

High-end smartphones have AI capability:

  • Apple A11 and A12 processors integrate a neural engine
  • Samsung Exynos 9810, in the Galaxy S9, uses the neural engine from DeePhi (Xilinx)
  • Huawei Kirin 970 and 980 feature a neural engine from Cambricon
  • Qualcomm Snapdragon 845 and 855 include Hexagon vector DSP
  • Mediatek P90 uses Tensilica Vision P6 with a custom neural engine

This AI inference is creeping down into both mid-range phones and also IoT devices such as voice assistants, smart security cameras, and drones. In particular:

  • Intel Myriad X with vision-processing hardware, direct camera interfaces, offering 1TOPS at 2.5W (with SPARC under the hood, at least for now)
  • Bitmain BM1880 with Arm under the hood, also 1TOPS at 2.5W
  • Chinese chipmakers developing chips for this market (since most of the cameras come from China): Canaan/Kendryte, Cambricon, HiSilicon, Horizon Robotics
  • Arm Cortex-M4 has DSP extensions to improve performance
  • Eta compute offers Tensai MCU with CoolFlux DSP accelerator
  • Greenwaves GAP8 MCU implements 8-core accelerator with RISC-V architecture

Automotive

 One particular type of "edge device" is the automobile. (I won't cover trends in automotive. I have covered them repeatedly in the last few years. If you need to get up to speed then start with Automotive Summit: The Road to an Autonomous Future if you need a kick-start). The current competitors in the merchant market are NVIDIA and Intel.

  • Intel's Mobileye subsidiary is the market leader for level 1/2 ADAS. It offers the lowest power at the lowest cost with $700M in 2018 revenue
    • EyeQ4 (production) offers 2TOPS at 6W
    • EyeQ5 (2020) offers 12TOPS at 5W
  • NVIDIA offers powerful processors targeted at level 4/5 and is the leader in autonomous vehicle design wins
    • Xavier (production) offers 30TOPS at 30W
    • Pegasus Drive can combine one or two Xavier chips and one or two Volta GPUs

It was too late to get a mention at the conference, but Tesla recently announced their FSD processor SoC. See my recent post Tesla Drives into Chip Design.

Conclusions

  • AI is moving from adapted architectures such as GPUs and DSPs to special-purpose custom-designed architectures with MAC arrays
  • Inference is moving from the data center to the edge
  • Smartphone vendors see AI performance as a differentiator (face recognition, etc)
  • Automotive is a big processor opportunity
  • IoT is the highest volume area for AI-enhanced processors

But if you have been reading Breakfast Bytes regularly, you already knew that. But this post contains a lot of detail about who the players are and what their current (and in some cases, future) offerings are.

 

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.