TWS Earbud Design Is About Scaling

4 Apr 2022 • 5 minute read

Introduction

Various factors are fueling the continued growth of True Wireless Stereo (TWS) earbud shipments; an important one is the jettisoning of the audio jack on leading smartphones. Growth is expected in all segments, spanning base models to mid-tier and premium models.

It is important that OEMs and SoC providers of TWS earbuds prepare to offer the quality and features the market expects of next generation products. Here, we explore the architecture of the signal path and the approach to scaling it across different product tiers.

Questions in the minds of OEMs and SoC providers

The signal processing requirements across these tiers/segments will vary significantly. This poses a conundrum to an OEM or SoC provider, “How do I address these variegated requirements—would these entail different architectures, each with disparate system and software frameworks? How many vendors do I need to work with to create, say three SKUs, call them Good-Enough, Better, Best? Finally, how do I get all this done with the limited resources available? Or do I need to simplify my problem, limit my TAM by betting on one segment and just go with that?”

It’s about scaling.

Product requirements

Let’s illustrate this with the following hypothetical requirements for the three SKUs, as Table 1 shows:

Feature	Sub-feature	Good Enough	Better	Best
Music Playback Time	LC3, SBC, OPUS	X	X	X
Talk Time	LC3, mSBC	X	X	X
Noise Reduction	Suppression	X	XX	XXX
Noise Cancelation	ANC		X	XXX
Voice Command	Simple/ASR	X	XX	XXX (limited ASR)
Sound Analytics	Glass break, Emergency Vehicle approaching, and more			XXX
Context Adaptation			XX	XXX
Sound Stage			X	XXX
Cost Tolerance	DSP, CPU	$	$+	$$+
Battery Capacity		X	X+	XX
Power On/Off?	No → Always-on	X	X	X

Table 1: Product Requirement for three SKUs

A basic requirement of all three SKUs is long music playback and talk time with support for the applicable codecs. Subsequently, the requirements begin to diverge in feature availability and sophistication. Captured here is the trend that consumers want to interact with their earbuds through voice commands and would appreciate being able to talk somewhat naturally when issuing commands, even if in a limited way. Hence, the SKUs support different voice commands with different vocabulary sizes. The premium SKU supports limited Automatic Speech Recognition (ASR). Cost tolerance and battery capacity also varies across our imaginary SKUs. Considering these requirements, how do we arrange an architecture that will scale across the SKUs?

Base model

Let’s start by targeting the “Good Enough” TWS earbuds design. As agreed in the table, the last row eliminates the power button, making the earbuds an always-listening accessory for all SKUs. Take them out of the charging cases and they are all ears from then on—everyone deserves hands-free convenience.

Figure 1 shows a single DSP in an always-on configuration performing sensor fusion and voice processing continuously.

Figure 1: Always-on DSP Configuration

It keeps the rest of the system in power down mode while it looks for commands from the microphone or through the other sensors. When the user wants music playback or call processing, the DSP wakes up the rest of the system and enters active mode. In this case, HiFi 1 also serves as the main application processor, as it can run control code efficiently. This eliminates the need for a separate CPU, leading to a smaller design and lower power.

Frequency scaling

In the always-on mode, the DSP may be running at low frequency to further conserve energy. When entering active mode (sometimes called turbo mode), HiFi 1 can raise its clock frequency to accommodate the higher workload—music playback or call processing. It may raise clock frequency even further if also performing noise suppression or cancelation, such as ANC. The latter is important to maintain good audio quality. This meets the noise reduction requirement in Table 1.

The clock frequency can be raised even higher to support simple voice commands beyond wake-word detection. Dynamic Voltage-Frequency Scaling or DVFS can be used to raise or lower the frequency depending on the use case currently active. This keeps energy consumption to the necessary minimum. HiFi 1 can be synthesized to run up to 1GHz frequency, depending on the technology to accommodate various levels of concurrent workloads, many of which may be AI based or hybrid DSP/AI.

AI

This brings us to AI. Many modern applications, such as voice commands and noise suppression are AI based. How do we do that in an always-on DSP? Well, despite being compact, HiFi 1 supports Neural Network workloads through the inclusion of NN instructions and architectural support. These run with high energy efficiency. Benchmarks such as OK Google Keyword spotting, and TFLM person detect show significant energy savings over other HiFi DSPs; therefore, many NN applications can run in the always-on mode.

Integrating the BT controller function

An additional cost-optimization step might include subsuming the BT Controller functionality in the DSP, especially, if it is efficient at running control code. This can help eliminate yet another block, reducing area, power, and energy, further. This would leave HiFi 1 as the only processor in the system—to get to the smallest configuration. As this also means eliminating memory for the controller (ROM/RAM) from the controller or CPU subsystem, this accrues significant PPA benefit. Where cost and battery size are most important, these measures help achieve those goals.

Fewer processors also mean fewer tools and disparate environments, making software development a lot easier. So, let’s talk software.

Software

Software should not be an afterthought, as it is typically, the long pole in the tent. For this reason, HiFi 1 maintains software compatibility with current and previous generation HiFi’s. Product Managers will appreciate that the large array of software developed by Cadence/Tensilica^® partners on the other HiFis is immediately available on HiFi 1. These software applications, developed by domain experts, are optimized to run on HiFis, using the Cadence Neural Network libraries (NNlib) and DSP libraries (NDSP lib). Fast time to market for OEMs by employing partner applications is a cornerstone of HiFi success.

Conclusion

We have embarked on what we hope will be a scalable signal processing architecture of TWS earbuds. For Good-enough earbuds, we up-featured from current generation base models and fleshed out the architecture economically supporting always-on and active modes of operation to maximize battery life.

Having pared the TWS SoC design down to a single processing entity—HiFi 1 in this illustration—one might be tempted to go home and exclaim, “Honey, I shrunk the TWS earbud SoC”!

What’s next

Next time, we will look at what it takes to scale this architecture for the mid- and high-end product segments. Stay tuned.

Smallest, ultra-low-energy DSP for always-on sensor fusion, always-listening voice trigger, and Bluetooth/BLE codecs

Scaling up the TWS design