Never miss a story from Breakfast Bytes. Subscribe for in-depth analysis and articles.
This morning, at a workshop at the AI Hardware Summit, Cadence is laying out details of its AI strategy and a new way that the product line of Tensilica processors will be structured going forward. On Wednesday morning, Sanjive Agarwala will go into more details in his presentation that follows Lip-Bu Tan's keynote. His presentation is titled Scalable On-Device to Edge AI for Pervasive Intelligence. He started by pointing out is that on-device (as opposed to in data center) AI is growing at a CAGR of 37%, and is forecast to reach $50B by 2025. However, bucketing everything together like that hides a very important point: the requirements for on-device AI vary by orders of magnitude from under 1 TOPS (tera-operations per second) for a smart sensor up to 100s of TOPS for autonomous driving and ADAS. This requires a range of different processors.
Tensilica processors have been around for a long time, longer really than AI has existed in its modern incarnation of training and using neural networks. So it is not that obvious that Tensilica processors are used for a lot of AI applications. For example, the HiFi series of processors was originally designed, and named, due to the high sound quality and support for decoders like Dolby Atmos. But we added AI features to HiFi since it is increasingly used for applications like smart speakers ("Alexa, turn on the light") where at least wake-word recognition if not more is done on-device. The same goes for vision processors and the rest of the product line.
It is not just performance, but also other attributes such as power (often measured in TOPS/W since how much bang you get for your buck, or rather watt, is important) and also what types of data are required (floating point, 8-bit, and so on).
Going forward, the Tensilica product line for AI will, like Caesar's Gaul, be divided into three parts:
All of the NNE and NNA products include random sparse compute to improve performance, run-time tensor compression to decrease memory bandwidth, and pruning plus clustering to reduce model size.
The software stack is common to all processors and addresses all target applications, streamlining product development and enabling easy migration as design requirements evolve. This software includes the Tensilica Neural Network Compiler, which supports these industry-standard frameworks: TensorFlow, ONNX, PyTorch, Caffe2, TensorFlowLite, and MXNet for automated end-to-end code generation; Android Neural Network Compiler; TFLite Delegates for real-time execution; and TensorFlowLiteMicro for microcontroller-class devices.
Automotive applications probably span the biggest range of performance requirements, from using AI Base for something like driver monitoring, AI Boost for ADAS features like lane assist or automatic emergency braking, all the way up to AI Max for autonomous driving.
Sign up for Sunday Brunch, the weekly Breakfast Bytes email.