Never miss a story from Breakfast Bytes. Subscribe for in-depth analysis and articles.
Why is image processing so important? To a first order approximation, all data is vision and video. Of course, things like GPS systems generate data too, but in terms of data volume the amount of data is trivial and the complexity of its processing is simple. I mentioned in my post about Rowen on Vision, Innovation, and the Deep Learning Explosion, that Chris talked about how taking a cheap H.264 video camera and naively uploading its raw feed to the cloud to process it would cost nearly $4,000 over three years (per camera). So that is not going to happen, and pretty much anything that turns the raw camera data into useful information has to be done locally.
If you read coverage of the announcement of any new smartphone, there is a lot of information about the cameras and the image processing. Why? Because smartphones are more cameras than they are phones—when was the last time you heard anybody talk about voice quality. However, you will hear plenty of talk about resolution and frame-rates, 32 megapixels (MP), 4K video, 120 and even 240 frames per second (fps).
Another trend is towards multiple sensors, with two or even three sensors. The chart to the left shows the decline of single sensor back cameras in blue, the growth of dual sensor in red, and the start of the ramp for a triple sensor in grey (the front camera, used for short-range applications like selfies and video-calls, is still a single sensor). Multiple sensors require increased vision processing, not just due to all the extra pixels, but because the images from the multiple sensors need to be fused into a single picture or video stream.
The combination of high resolution and multiple sensors, coupled with high-performance image processing, is leading to killer use-cases such as:
It is instructive to take a look under the hood of a camera and see what it takes to put together a modern smartphone. The Huawei Mate 10, Mate 10 Pro, and Porsche Design Huawei Mate 10 is a series of 4.5G smartphones, delivering 1.2Gbps over LTE cat 18. The Mate 10 is one of the new generation of smartphones that is, at least partially, targeted towards being your only computing device. It has a USB-C connection (see One Connector to Rule Them All: USB Type-C for more information) that can be connected to a display, either directly if the display supports it, or using an HDMI dongle. If it is going to be your only computing device, it had better have some serious compute power, and it does—an 8-core CPU and a 12-core GPU. But it also has a very high-end camera. Or rather cameras. Leica lenses, 20MP mono and 12MP RGB sensor, f/1.6 aperture.
Inside is a HiSilicon Kirin 970 SoC (HiSilicon is the fabless semiconductor subsidiary of Huawei), manufactured in 10nm and containing 5.5 billion transistors. To do the image processing it uses a Tensilica Vision P6 DSP. It also is one of the first phones to contain a neural processing unit, another look to the future.
The image pipeline for any camera starts with between one and three image sensors, most typically two. These produce the raw image data. There is a lot of it, and it needs a lot of processing to get to something that we would consider a photograph or a video, and more still to do an analysis of the picture, such as identifying the people photographed or to handle gesture recognition. The above diagram shows a typical pipeline, starting with the image sensors on the left, basic image processing, then post-processing. Depending on what post-processing is required, this can be intense image processing, or alternatively using neural networks for detection of what is in the scene.
It is still in its infancy, but another driver of vision processing is augmented reality (AR). This requires a lot of identification of exactly what is in the scene, and where, in order to overlay additional information such as labels or instructions (or Pokémon). Kits such as ARCore from Google and ARKit from Apple are starting to make creating AR applications more feasible. However, a lot of image processing is required before these higher level applications can operate on the image.
Cadence has two Tensilica processors for pure image processing: the P5 and the P6. The P5 is for general-purpose imaging and has 64 MACs and an optional FPU. The P6 is higher performance, with 256 MACs and has up to four times the performance of the P5. The P6 also has a 32-way SIMD vector FPU with 16-bit FP, making for easy porting of algorithms originally developed for GPUs.
The C5 DSP is intended primarily for the implementation of neural networks. It can be configured with 1024 8-bit MACs or 512 16-bit MACs. The architecture is a 4-way VLIW and 128-way 8-bit SIMD, with integrated DMA, dual load/store. It is not just a convolutional accelerator but has been designed specifically for neural network implementation. It only requires 1mm2 to deliver 1 TeraMAC/second computation capacity in 14/16nm. Since the C5 can be configured into multi-processor systems, it easily scales to multi-TMAC/sec systems. Since it is fully programmable, it is easy to adapt the algorithms to new developments in what is, today, an extremely fast-changing field.
More information on the Tensilica vision processors is on the Cadence Visions DSPs page. There is also more information on the Huawei Mate 10 (the Mate 10 Pro is available in the US from mid-November).
Sign up for Sunday Brunch, the weekly Breakfast Bytes email.