See Further by Standing on the Shoulders of...OpenVX

6 Apr 2017 • 5 minute read

For power and performance reasons, video and vision processing are attractive to offload from the main processor in a datacenter or in an embedded device. An Intel x86 server processor, or a 64-bit ARM^® processor are not the best tools for the task. They don't have enough performance, and in the mobile case the power consumption is too high. One challenge is how to describe the vision processing that is required, without losing all the gains from acceleration due to excessive communication between the main processor and the accelerator.

This is one area that the Khronos group has been working to address with the OpenVX API. If you haven't heard of the Khronos group, in their own words:

The Khronos Group was founded in 2000 to provide a structure for key industry players to cooperate in the creation of open standards that deliver on the promise of cross-platform technology. Today, Khronos is a not-for-profit, member-funded consortium dedicated to the creation of royalty-free open standards for graphics, parallel computing, vision processing, and dynamic media on a wide variety of platforms from the desktop to embedded and safety-critical devices.

To put that into perspective, here are the logos of the member companies, with the members active in the OpenVX working group ringed in red, and with companies that have conformant OpenVX implementations ringed in green (obviously, some companies, such as Cadence, have both red and green rings).

At today's Linley Autonomous Conference (fka Linley Mobile) Cadence's Frank Brill will present OpenVX: An Industry-Standard Computer Vision API for Autonomous Hardware. Although part of the message is that the Tensilica Vision P5 and P6 DSPs are ideal processors for implementing OpenVX offload, Frank's presentation is also tutorial in nature, assuming that most of his audience won't have in-depth knowledge of the topic already.

If you are implementing vision algorithms, or pretty much any algorithm, on a general-purpose processor with a single thread, then the typical programming model is to issue orders, "do this, do that, now do the other". More technically, this is known as "immediate function call programming model." Each operation is performed, and when it is complete, the next one is started. Since there is a shared memory, when an intermediate result is computed at one stage and then reused in the next, then it is left in memory by the first operation for the subsequent one to pick up.

This model doesn't work well for an offload processor, for several reasons:

The operations, and the order in which they are executed, might not be optimal for the offload processor
Intermediate results need to be transferred back to the host processor since they might be required
When the image gets too large to fit in host memory simultaneously, it needs to be tiled, and programming this is tedious and error-prone
There is a lot of unnecessary host processor interaction
Coding this by hand is not portable, the hardware on which it is running "shows through"

OpenVX solves these problems by breaking vision processing down into a two-phase process. In the first phase, a graph of image operations (known as nodes), which can be on any hardware and coded in any language (for example, on a GPU perhaps coded in CUDA). In the second phase, the graph is executed. The above diagram shows a graph for feature extraction.

The above code (don't worry, you don't have to understand the details) shows an edge detector. In the first few statements, the graph is assembled. One important feature is that a "virtualImage" only needs to reside in the accelerator—in a sense it is a promise by the host processor never to access it. Next, the "graph mapper" checks and optimizes the graph, with a view of the entire operation to be performed. Finally, the "runtime" processes the vision data (in this case, detecting edges).

The advantage of this approach is that the user doesn't need to handle DMA, tile overlap, local memory management, special hardware features like scatter-gather. This is all handled by the OpenVX framework.

There's much more than this, way too much to cover in a single post that is not intended to be a full training course. Two important extension areas are to:

Neural networks, adding nodes (remember, node is VX-speak for an image operation) for common neural network operations such as convolution or activation
OpenVX SC, for safety critical. This leverages the capability to build the graph on a host (cloud) and then deploy it on another (car)

So that's a whirlwind tour of OpenVX. What does Cadence do specifically? Cadence provides an application programmer's kit (APK) including:

A front-end module compliant to OpenVX 1.1, you can call any OpenVX API function to invoke the front end module
A graph mapper that translates the OpenVX graph into a script that can run on Tensilica Vision DSPs
Runtime that executes the scripts
XI library that executes the nodes in a highly optimized fashion

Under the hood, the above diagram shows where the various components operate. On the left is the host processor; on the right is the Vision DSP. Communication is done through shared memory (for the vision data and intermediate results) and directly through interprocess communication to trigger the operations.

Although OpenVX is an open hardware-independent specification, the underlying hardware still matters for power and performance. The diagram above shows the architecture of the Tensilica Vision P6 processor, which is optimized for vision applications such as those required for everyone's favorite application area, ADAS and autonomous vehicles.

For more details on the Vision Px processors, and OpenVX, see the Tensilica Vision page.

"+ res.PreviousPostTitle); // //NextPostUrl // //Previousposturl // } // }); }); if ( $('.blog-post.nextweb-blog-post .ifrmesrc').length ) { iframeattr = $('.blog-post.nextweb-blog-post .ifrmesrc'); markup = ''; $('.blog-post-content .ifrmesrc').html(markup); $('.blog-post.nextweb-blog-post .ifrmesrc').show(); } -->