Never miss a story from Breakfast Bytes. Subscribe for in-depth analysis and articles.
The Tensilica Vision Q7 DSP is the sixth-generation vision and AI DSP. It has an enhanced instruction set and additional hardware for SLAM applications. Cadence has a SLAM library that is optimized for the Vision Q7 DSP and lets SLAM applications be coded quickly and efficiently. Let's take a quick look at the processor itself, and then look at the what SLAM is, and what functionality the SLAM library provides.
The Vision Q7 DSP hits a sweet spot in terms of performance and power. It is much higher performance than a general-purpose CPU, which lacks multiple SIMD lanes for parallel processing. GPUs typically have enough computational horsepower but they are very thermally costly, generating a lot of heat, and draining any batteries fast.
Above is a blog diagram of the Vision Q7 DSP. Its headline numbers are:
First, let's take a look at what SLAM is. SLAM stands for Simultaneous Localization and Mapping. Using SLAM, robots build their own maps on-the-fly. It lets them know their position by aligning new data from sensors like cameras and to combine it with whatever sensor data they’ve already collected to build out a map for safe navigation.
In some ways, it is a bit like finding your way through a maze in a video game. At first, you have almost no information. As you proceed, you gather more data and you can build out a partial map of the game. It has some of the same characteristics as functions like driving. Some parts of the map are fixed (walls, buildings) and other parts are moving (cars, pedestrians, weapons, and combatants).
SLAM is an important technique for locating an object in space. In the three-dimensional world in which we live, an object can have up to six degrees of freedom. Three degrees of freedom are the location of the object in space (x, y, and z coordinates). The other three are the orientation of the object, often called the pose, with the three dimensions traditionally named after the movements of a ship: pitch, roll, and yaw. SLAM allows a robot to navigate without requiring a complete detailed map. It is challenging since ideally you need a map for localization and pose calculation, and a good pose is required for mapping. So there is a chicken-and-egg problem in that everything affects everything else.
One area where SLAM is important is advanced driver assistant systems (ADAS) and autonomous vehicles. But you might have a "robot" that uses SLAM in your home right now. Automatic vacuum cleaners, such as Roomba, use SLAM to create a map of the area to be cleaned. Of course, they have fewer degrees of freedom than a drone since they can't go up and down.
SLAM can be used alongside other location systems such as GPS to achieve greater accuracy. GPS is not always available (indoors, and around skyscrapers) and, in any case, the accuracy is not sufficient for navigation, which typically requires accuracy of a few inches or less. Another additional source of data can be counting wheel rotations, known as wheel odometry.
A particular use case of SLAM is in augmented reality (AR), where once the device has located itself and identified objects on the map, additional data can be overlaid on the view. Today this is almost certainly a phone, but in the future, maybe we will have AR glasses.
The SLAM library for the Vision Q7 DSP allows high-performance SLAM applications to be built around vision sensors such as cameras, lidar, or radar. The image above shows some of the complexity in a real-world environment. There are thousands of corners that can be identified, and then over time, they can be matched as the robot moves through the environment.
The typical SLAM flow involves three main steps: key point detection and matching, camera pose estimation, and triangulation (putting it all together). In effect, these are the L and M of SLAM. Localization is working out how the camera is moving. Mapping is working out what all the detected points are and how they are moving. A variety of popular methods for detecting and describing corners are supported by the library. At the end of each frame, there are always residual errors from the algorithm, so both the points being mapped and the pose need to be tuned.
Processing vision data has a number of challenges. The first is that the amount of data to be processed is large, especially in larger vehicles with multiple sensors. At one level that requires the images to be tiled so that each tile can be processed efficiently. There is a complete library for handling this, known as the full image kernel (FIK) library. The FIK framework automates the image tiling and DMA scheduling, so anyone can easily expand a tile-level optimized function from XiLib or written by a developer into a DMA integrated kernel that processes the entire image. The framework simplifies the task of moving images in and out of memory using DMA to overlap the data movement with computation.
The diagram above shows a typical SLAM flow. The boxes on the left with green outlines are implemented in the SLAM library. The boxes on the right in the red outlines are specific to the application.
The library includes implementations for:
The library is actually split into two parts, the SLAM library itself and the full image kernels (FIK) library, which is used for more traditional vision applications. Overall, the SLAM library contains over 100 optimized functions that a developer can use to build power and energy-efficient applications that use SLAM to run on Vision Q7 DSP-based platforms at high frame rates.
The SLAM library and FIK library are available to all Vision Q7 DSP customers.
Download the white paper on SLAM and DSP Implementation written by Amol Borkar. Or see him explain in this two-video set Deep Dive on Simultaneous Localization and Mapping (SLAM):
Sign up for Sunday Brunch, the weekly Breakfast Bytes email.