Vision Q7 DSP: Real-Time Vision and AI at the Edge

15 May 2019 • 4 minute read

At CDNLive EMEA, we announced the latest member of the Tensilica family at the press conference, although it was embargoed until this morning. This is the Tensilica Vision Q7 DSP. The earliest Vision cores were purely focused on image processing in the photographic sense (for example, compressing images from surveillance cameras), but the later cores, such as the new Q7, also have AI processing for things like feature recognition (for example, surveillance cameras that only record when something abnormal is happening). The new core is targeted at automotive, augmented reality (AR), virtual reality (VR), drones, mobile, robotics, and surveillance markets. These all involve both cameras and sophisticated processing.

Growth of Image-Sensor Market

Every image sensor needs some sort of image processor. The above chart shows the market growth. It is forecast to have a CAGR of 12% from 2018 to 2025. The largest growth comes from:

Mobile, growing at about 8% but already far and away the biggest market (most high-end phones have at least three cameras)
Automotive, with CAGR of 36%
The other markets, such as AR/VR, drone, and robotics, are growing at CAGR of 12%

SLAM

One particular application, used in many end markets, is known as SLAM for Simultaneous Localization and Mapping. Meera gave an overview of it in her post SLAM: It’s Not Just a Wrestling Term. It is a technique used in the robotics, drone, mobile, and automotive markets to automatically construct a map of an unknown environment. In some ways, it is a real-world equivalent of what goes on in a lot of video games. It relies on a variety of linear algebra and matrix operations and is computationally heavy. Today, the implementation is typically on a CPU or GPU. CPUs are relatively low performance but high power consumption. GPUs have a lot of parallelism and can meet real-time requirements, but they require very large amounts of power and generate a lot of heat.

AI at the Edge

Another general trend is to do more AI processing at the edge rather than in the cloud. This is driven by a number of factors:

Latency: Moving data to the cloud and back takes time, which may impact responsiveness
Connectivity: A response may be necessary even if the link to the cloud is down (driving, for example)
Privacy: People are increasingly concerned about all the data their consumer devices like Alexa and smart TVs may be uploading to the cloud
Compute power: There is a lot of aggregate compute power available at the edge, especially in smartphones, so it is good to take advantage of it rather than requiring incremental compute power in the cloud instead

The amount of processing that this requires varies from 100s of TMACS (tera multiply-accumulates per second) for automotive and data center, down to 512 GMACS for smart home intelligent devices. As a result, one size does not fit all in these markets. This is one reason that the Cadence Tensilica family has what some find a confusingly broad portfolio. You obviously do not want to use a processor capable of delivering 10s of TMACS for a task requiring 500GMACS, since it would waste power and area. And obviously you simply can't use a processor that delivers 1TMAC if the task requires 10TMACS.

Performance

The sixth-generation Vision Q7 DSP extends the product line of Vision processors. The basic performance, which obviously varies by application, is that the Vision Q7 DSP is twice the performance of the Vision Q6 DSP, which in turn is about 1.5X the performance of the Vision P6 DSP. The table above gives more detail.

Software

All this processing power would not be much use if it was too hard to program. For AI, the Vision Q7 DSP can be programmed with the same Tensilica Neural Network Compiler as other processors in the range. This starts from a neural network description in a standard framework such as Caffe or Tensorflow. There is also support on Android for the Android Neural Network API. See the diagram above.

For vision, there is a full ecosystem of software frameworks and compilers for all visual programming approaches, using OpenCL, Halide, and other approaches.

Summary

For AI applications, the Vision Q7 DSP provides a flexible solution delivering 512 8-bit MACs, compared to 256MACs for the Vision Q6 DSP. For greater AI performance, the Vision Q7 DSP can be paired with the Tensilica DNA 100 processor. In addition to computational performance, the Vision Q7 DSP boasts a number of iDMA enhancements including 3D DMA, compression, and a 256-bit AXI interface. The Vision Q7 DSP is a superset of the Vision Q6 DSP, which preserves customers’ existing software investment and enables an easy migration from the Vision Q6 or Vision P6 DSPs.

For more details, see the product page.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.