• Home
  • :
  • Community
  • :
  • Blogs
  • :
  • Cadence on the Beat
  • :
  • Seeing Sound

Cadence on the Beat Blogs

MeeraC
MeeraC
19 Sep 2018
Subscriptions

Get email delivery of the Cadence blog featured here

  • All Blog Categories
  • Breakfast Bytes
  • Cadence Academic Network
  • Cadence Support
  • Custom IC Design
  • カスタムIC/ミックスシグナル
  • 定制IC芯片设计
  • Digital Implementation
  • Functional Verification
  • IC Packaging and SiP Design
  • Life at Cadence
  • The India Circuit
  • Mixed-Signal Design
  • PCB Design
  • PCB設計/ICパッケージ設計
  • PCB、IC封装:设计与仿真分析
  • PCB解析/ICパッケージ解析
  • RF Design
  • RF /マイクロ波設計
  • Signal and Power Integrity (PCB/IC Packaging)
  • Silicon Signoff
  • Spotlight Taiwan
  • System Design and Verification
  • Tensilica and Design IP
  • Whiteboard Wednesdays
  • Archive
    • Cadence on the Beat
    • Industry Insights
    • Logic Design
    • Low Power
    • The Design Chronicles

Seeing Sound

Robot with headphonesYou may know something about deep learning and machine learning when it comes to visual applications. Using various filters (or convolutions) and other methods of interpreting visual data, the layers of processing that data can, for example, identify the subject of a picture.

CNN

An example of a convolutional neural net (CNN)

But what about audio data? Machine learning and audio processing obviously work, because we have Siri and Alexa and so on—so how do we apply what we know about processing images to processing sound for natural language processing (NLP) systems?

Wave Breakdown

Just as a convolution layer of your visual image filters out the relevant data, so an audio signal is filtered into different segments before being fed into a neural network. You’ve seen 2D or 3D waveforms of music you’re listening to, right? This is the kind of information that is fed into the neural network. From there, you have a “visual” image of sound, which can then be processed just as an image would, whether you’re identifying the sound, removing background noise, or processing speech.

 Waveforms

Examples of waveforms, or audio waves

These images, or spectrograms, are

“…images representing sequences of spectra with time along one axis, frequency along the other, and brightness or color representing the strength of a frequency component at each time frame. This representation is thus at least suggestive that some of the convolutional neural network architectures for images could be applied directly to sound.” [L. Wyse]

As an example of a spectrogram to look at, here is a one-second clip of a man’s voice saying, “Nice work.”

This heat map shows a pattern in the voice which is above the x-axis. A neural network will be able to understand these kinds of patterns and classify sounds based on similar patterns recognized.

To quote Daniel Rothmann in his article, “The promise of AI in audio processing”,

Essentially, we are taking audio, translating it to images and performing visual processing on that image before translating it back to audio. So, we are doing machine vision to do machine hearing.

In other words, we are seeing sound.

Note that the heatmap above is for a short clip of audio, and the inherent problem is this: If you freeze a frame of, say, a movie, you get an image from which you can glean a lot of information, including being able to feed the image into a neural net. If you freeze a “frame” of audio, however, it means nothing. Audio is dependent on the “time” axis.

Now imagine a sound clip with multiple voices, background noises, foreground noises, overlapping voices, and all kinds of sounds mixed together in one chunk of sound. It takes so much processing to distinguish each element, and then identify it. And then distinguishing what the voices are saying, and then not only what they’re saying, but what they mean—imagine the processing power required for each segment of audio that is processed by the neural network.

This is where Cadence Tensilica HiFi DSPs for audio, voice, and speech comes in, with more than 225 audio, voice, speech recognition, and voice enhancement software packages already ported to the HiFi DSP architecture. Audio applications present unique problems that must be addressed specifically by the DSP, and Cadence is on the leading edge of what is possible.

—Meera


What I read for this post:

  • 10 Audio Processing Tasks to get you started with Deep Learning Applications
  • Audio spectrogram representations for processing with Convolutional Neural Networks
  • Getting Started with Audio Data Analysis Using Deep Learning
  • The Promise of AI in Audio Processing
  • Using Deep Learning for Sound Classification: An In-Depth Analysis
Tags:
  • audio |
  • Cadence on the Beat |
  • HiFi |
  • Tensilica |