Kneron's Experience Reducing Edge AI Processor Development Schedules with Tensilica DSPs

10 Feb 2021 • 5 minute read

As late as 2010, the received wisdom among computer scientists was that neural networks would not amount to much. Those ideas had been tried repeatedly for decades and never produced practical results. Yes, our brains work that way, but it seemed that computer hardware and software could not achieve anything like the same results as "wetware". In two years, everything changed. What happened?

First, ImageNet happened. For more details, see my post ImageNet: The Benchmark that Changed Everything. Despite its name, ImageNet is actually a huge set of image training data, millions of pictures on the web that have been classified as to what they contain ("a dog", "a dalmatian", and so on). ImageNet was first announced in 2009 in a poster session — apparently not significant enough to merit a full paper. It turned out to catalyze a lot of research in vision recognition that had been stymied before by the lack of good annotated data. The second thing that happened was that in 2012, a neural network known as AlexNet won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), beating all the non-neural-network approaches. That was just nine years ago.

In the intervening nine years. neural networks and have improved almost immeasurably. It turns out that for many tasks, such as language recognition or identifying pedestrians in a video, "training" a neural network is a better approach to traditional programming. But it requires a lot of training data and a lot of compute power — luckily the cloud came along around the same time, and people worked out how to use GPUs to accelerate the training. Neural nets got bigger and required more and more training. Famously. GPT-3, a language network with 175 billion weights, is estimated to have cost nearly $5M to train.

Inference at the Edge

So are we going to put a cloud data center in our cars, and fill them up at the bank instead of the gas station (or the charger)?

The second part of using a neural network, once it is trained, is known as inference. This is a simpler task since it is not iterative. Even so, the amount of compute power required is huge — when done on a conventional microprocessor. It turns out that conventional microprocessors are just not a good architecture for neural networks since too little is done in parallel. Sometimes, inference can be done in the cloud, such as classifying my entire photo album. But we really want inference at the edge, in devices such as smartphones, Alexa-type devices, or cars. To do this, in practice, requires special chips, because there is just not enough of the right compute power in these devices. Inside those chips, there are two key components, a DSP optimized for neural network inference, and some sort of array of multiplier-accumulators (MACs) that can be driven by the DSP.

For an overview of this, see my posts HOT CHIPS Tutorial: On-Device Inference, and Linley Processor Conference 2020 Keynote.

Cadence's Tensilica portfolio of DSPs has several processors that can be used for this. The HiFi series optimized for audio processing, the Vision series optimized for (surprise!) vision processing, and the DNA series optimized for the most demanding workloads needing billions of MAC operations per second (known as tera-ops per second, or TOPS). My most recent posts on the Tensilica family are:

Kneron

One company designing inference-at-the-edge chips with Tensilica DSPs is Kneron. Kneron is a San Diego-based startup that designs and develops integrated software and hardware solutions that enhance smart devices for applications such as smart surveillance, security, mobile, robotics, and industrial control applications. Based on its early successes, Kneron has received a variety of industry accolades, including earning an honorable mention in Gartner’s Cool Vendors in AI Semiconductors 2020 report and being named a winner in the Business Intelligence Group’s 2020 Artificial Intelligence Excellence Awards program. CB Insights also named Kneron in their 2020 AI 100 ranking of the 100 most promising AI startups in the world while EE Times named them in their 2020 Silicon 100 list. I think that qualifies as being a hot startup.

On-device AI inferencing requires striking a balance of performance, power, and size. Kneron’s next-generation 1.4 TOPS edge AI processor features higher power efficiency, and a higher compression rate with lower accuracy loss compared to its predecessor. When designing this chip, Kneron needed to accelerate mathematical computing functions while also minimizing power and maintaining a small chip area. They needed a solution capable of co-processing complex computer vision algorithms alongside their proprietary neural processing unit, as well as a programmable, future-proof solution capable of offloading their fast-growing future AI workloads.

Kneron picked the Tensilica Vision P6 DSP, which provides up to 2X faster performance for computer vision and neural network pre and post-processing compared to its prior-generation SoC, while delivering crucial power efficiency for edge AI applications. For more details on the Vision P6 DSP, see my post from when we announced it, New Algorithms for Vision Require a New Processor. The Vision P6 DSP supports an N-way programming model, allowing users to write source code that is portable across the family of Vision DSPs, promoting both backward compatibility and future-readiness. In fact, the future is already here if you are starting a new project today: see my post Vision Q7 DSP: Real-Time Vision and AI at the Edge.

Summary

Key challenges
- Pre- and post-processing functions are compute-intensive and not optimal for NPU
- Need future-proof solution capable of offloading fast-growing future AI workload
Solution
- Tensilica Vision P6 DSP
- Xtensa Imaging Library
- Xtensa Neural Network Compiler
- VIP for DSP Integration
Results
- Vision P6 DSP provides up to 2X performance gains
- Development cycle time reduced by 10%

Or, as Albert Liu, the founder and CEO of Kneron put it:

Removing hurdles and making AI algorithm deployment on our platform easy is key for us and our customers’ success as our mission is to enable AI everywhere, for everyone. The Tensilica Vision P6 DSP packs a lot of compute capacity to tackle the latest AI challenges.

Learn More

There is a success story on the Kneron development: Kneron and Cadence: Reducing Edge AI Processor Development Cycle Time.

For more on Cadence's Tensilica family of processors, follow the links earlier in this post. Or start at the Tensilica Processor product page.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.