Never miss a story from Breakfast Bytes. Subscribe for in-depth analysis and articles.
At the recent D&R IP-SoC Day 2022, Amol Borkar presented Designing the Next Ultra-Low Power Always-On Solution. Amol is a product marketing director for the Cadence Tensilica product line.
He started with some statistics about Tensilica. Tensilica presentations normally start with this, and usually, I omit it, but it's been a long time since I presented the numbers, so to bring you up to date:
The trends in processors are different for different markets. The most important seem to be automotive, robotics, drones, mobile, and AR/VR. Image sensors (aka cameras), lidar, and radar are all used heavily in automotive. For example, Time of flight (TOF) is used in AR/VR.
An area that is of growing importance and interest is always-on (AON), ultra-low-power computer vision, and AI. The basic idea is to have a tiny AON processor, either in an on-chip AON island or as a separate little companion chip. The AON processor monitors things such as Ethernet, gyroscope (to sense when a phone is picked up), and, increasingly, microphone. There is interest in using cameras for what are called "visual wake words," detecting a person. Once the thing is noticed, the AON processor wakes up the "main" processor to take over. For example, an AON processor might detect that a person is in front of the computer, and the main processor can then use more powerful techniques to see if the person is recognized.
In the distant past, the sensor would feed directly to the CPU. This would require something like pressing a button to activate since the CPU is too power-hungry to keep on all the time. The next stop was to have the sensor feed a DSP, and keep the CPU in standby until the DSP noticed something interesting. Now, the best approach is to have the sensor feed an AON processor and keep both the DSP and the CPU in standby.
The requirement for an AON processor is to be small, and low power. It runs continuously actively monitoring some sensors such as cameras (maybe only needing 5 frames per second), microphones, or other sensors. There needs to be a very "lite" network for AON usage, designed to run on microcontroller class devices. The most common platform is TensorFlow Lite for Micro. One challenge for AON is that networks like this are not in the common model "zoo" of easily downloaded networks. You can't just load up one of the common models, such as Yolo, into a low-power processor and get a solution suitable for AON. Most likely, the model will not fit, and if it does it will consume too much power.
This table shows the difference between a typical model zoo network and a typical AON network. These are just typical numbers, not a specific design. As you can see, networks for AON processors are significantly smaller than typical model zoo networks. A model zoo network might have 100 layers; an AON network might have just 12.
There are 3 stages of device wake/unlock:
Let's focus on the detection phase. The base level has the best power and has simplified algorithms, but it doesn't have enough compute for things like user identification. There is a lot of effort going on in what Amol calls the "mid range." This allows one modality (voice or vision but not both, for example), but it can do keyword spotting or face detection. Emerging are systems that require more than one modality. This can either be done with two AON processors, one for each modality, or with a more powerful design that can repurpose for both modalities. The area of the bigger processor will probably be less than the area of two processors, but the power consumption may be higher.
Tensilica has two solutions to the dual-modality issue, depending on which one is most important. If audio is the dominant processing, then there is the family of Tensilica HiFi DSPs. If vision is the dominant modality, then there is the family of Tensilica Vision DSPs.
The leading solution for Low-Energy AON is the Tensilica Vision P1 DSP, the best-in-class low-energy DSP for vision and AI. For more details on this processor, see my post Tensilica Vision Q8 and P1 DSPs, More AND Less. This processor is scalable based on the existing successful Vision DSP architecture. But smaller, with 70% area and power savings versus the Vision P6. It has the same software chain. Like all Tensilica processors, it has a scalable and configurable architecture, with a 128-bit SIMD, up to 128 MAAC performance, and the Tensilica Instruction Extension (TIE). For more on TIE, see my post Custom Instructions in Tensilica: Wearing a TIE Makes You Smarter.
The P1 is low enough power to be always-on, but enough to provide features such as smart image sensor, face detection, keyword spotting, or foveated rendering.
Here's how you might set up a system for always-on face detection. The Vision P1 continuously performs a simple face detection algorithm, waking up the Vision DSP to do recognition and, if the face is recognized, waking up the host to unlock the phone or laptop.
Above are some benchmarks using MLPerf TinyML 0.5 (the current version is 0.7 but was only recently released). As you can see just by glancing at the graphs, the Vision P1 has many more inferences per second than the nearest competitor and much less energy use (aka power) than the nearest competitor.
See the Vision P1 product page.
Or, if you are going to the Embedded Vision Summit, Amol will be doing a repeat performance. Or if you are going to Embedded World in Nuremberg, he is also doing an encore there. And at CadenceLIVE Silicon Valley...yup, you guessed it.
Sign up for Sunday Brunch, the weekly Breakfast Bytes email.