HiFi DSPs - Not Just for Music Anymore

15 Apr 2020 • 4 minute read

When the Tensilica HiFi DSP family was first created, the focus was all on low-power high-fidelity music. Hence the name of the product. I've written about this over the years:

So in that era Cadence partnered with a large number of people who owned the audio codecs needed for high-end TV sound. Cadence has the broadest portfolio of audio partners.

It's About Listening, Not Playing

But then what people expected out of an audio processor changed. I talked about that, too, in:

Instead of being used just to replay compressed music in an aesthetically pleasing way, audio processing started to be increasingly focused on listening, in particular being able to listen to commands, separate the audio from background noise, and then perform some level of speech recognition.

In the beginning, the focus on the device was "wake word recognition". This means detecting the trigger word or phrase that wakes up the device such as "Alexa" or "Hey Siri".

At the level of the Tensilica DSP processors, that meant that in addition to doing great low-power digital signal processing (after all, that's what DSP stands for), there needed to be even more MAC units to handle the deep learning aspects of voice recognition.

Once the wake word was detected, early implementations would clean up the audio and upload it to the cloud for processing. However, there is an increasing desire to do more voice recognition on the device. Partly, this is because communication to the cloud is not always present, and also there is an increasing reluctance to upload the raw sound data continuously, or near continuously. Parents are not comfortable having the cloud be the place to detect if their baby is crying. Triangulating gunshots requires precise timing.

Inference at the edge, as opposed to just a single wake-word, increases the difficulty of programming the HiFi DSPs to handle this increased workload.

Linley Processor Conference

At the recent Linley Processor Conference, migrated online like all conferences these days, Cadence's Yipeng Liu presented Efficient Machine Learning on HiFi DSPs Using TensorFlow. TensorFlow, originally developed by Google but now open source, is one of the most well-known frameworks for the training of neural nets. Two other well-known frameworks are PyTorch (the most popular in academia) and Caffe. There is a more limited version of TensorFlow called TensorFlow Lite for Microcontrollers (TFLM) which is targeted at microcontrollers, not heavy-hitting data center CPUs and GPUs.

For more information on this process (mostly in the vision space), see my posts:

In the audio space, it is not about cameras but the microphone as a sensor. Important activities are speech enhancement, noise reduction, detection (wake words, crying babies, gunshots), identifying background music, and more. All these activities started in the cloud with limited processing power, but as processing has got more powerful, privacy, availability, and latency have driven more and more of the processing into the edge devices. Yipeng talked about the end-to-end solution that combines the HiFi DSP core, TensorFlow Lite for microcontrollers, and Cadence's own middleware that joins up the pieces. The HiFi cores are the leading audio solution, we (or rather our over 100 licensees) ship 1-2B cores per year. The challenge is that it can take a lot of time to develop a machine learning solution even though the DSP is very capable of it. This flow breaks down that barrier.

The diagram above shows the flow. Training is done in the cloud. Inference is done on the HiFi-based SoC. The off-white boxes are parts of TFLM. The red boxes are Cadence's HiFi and neural network infrastructure. The blue boxes are provided by the system or SoC designer.

This can be used very effectively and quickly. For example, Yipeng told us about elevators in China just this year:

We’re all staying home trying not to infect each other. One thing found in China was that elevators were a source of infection (touching buttons). So one company undertook rapid development of voice commands to control the elevator. In most cases, elevators do not have network access. So in a few days a team of engineers developed a solution to the elevator that overcome network availability, the challenge that the environment is noisy (enclosed space with hard walls, when elevator goes up and down different noise). It was deployed in Beijing, converting the elevator to voice control. I am proud to say it is using a HiFi 4 DSP voice AI processor using one of our ecosystem partner’s solutions. Many of this type of system will be deployed thoughout China and the world.

Example

As a pipecleaner Cadence tool example code from TFML: take a signal, develop the spectrogram, run the inference engine on the spectrogram. This was run on an NXP RT600 reference board (that you can buy, it's not something proprietary). The SoC on the board contains a HiFi 4 DSP. Yipeng wanted to run the demo for us as planned...but it was in the office and so she didn't have it to show us.

In the Q&A, she was asked about battery life. She said that we have a tablet in the office, and we called it up every day, and the battery stayed up for a month. So yes, power-efficient.

Another more complex example is s smart speaker design, with cloud connectivity as well as on-device inference, the capability to play music, and all the features you expect from a smart speaker.

Learn More

See the Tensilica HiFi Audio DSP page.

Or watch Yipeng's presentation:

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.

"+ res.PreviousPostTitle); // //NextPostUrl // //Previousposturl // } // }); }); if ( $('.blog-post.nextweb-blog-post .ifrmesrc').length ) { iframeattr = $('.blog-post.nextweb-blog-post .ifrmesrc'); markup = ''; $('.blog-post-content .ifrmesrc').html(markup); $('.blog-post.nextweb-blog-post .ifrmesrc').show(); } -->

Attachment: