The last article examined the datapath design of up-featured base model TWS earbuds. The design featured a single DSP that scaled from always-on power modes waiting for sensory inputs that signal user intent, to active operation, whether it was for audio calls or music listening. The architecture made for long battery life for Good-enough earbuds. We now look to scale the architecture from Good-enough earbuds, to Better (mid-tier) and Best (high-end) models. While adding features and improving user experience, we hope to preserve the software investment made by OEM and/or SoC vendors. This is essential for fast time-to-market, as we reach to grab a larger portion of the available market TAM.
Nothing detracts more from musical experience than the ambient noise that seeps in and gets past the noise cancellation. Similarly, ambient noise or extraneous sound creeping into phone conversations can be quite disturbing. As noise is so prevalent in our daily lives, noise suppression and cancelation performance are among the most important factors consumers consider when buying new earbuds or headphones. Consumers gain value if their conference call participants do not hear the gardener’s loud leaf blower or the deafening sound of an overhead airplane. In these times of working from home, they would certainly appreciate not hearing your rambunctious Roomba rubbing your floors.
If they can meet noise and unwanted sounds and treat them both as imposters, the earbuds would be truly Better. For superior musical experience and call clarity, high-quality earbuds can benefit from more powerful algorithms that can tame the noise to near silence.
…and the Best TWS earbuds are merely the players. The Best earbuds might additionally take the noise suppressed stereo audio, up-convert to multi-channel, multi-object audio, and place that around the user’s head, creating a rich, spatially wide, and elevated, “take-it-with-you” sound stage. Whether you go jogging or globe-trotting, the sound stage is always around you. The whole world is now your stage.
Suppressing noise to a higher level and spatializing sound to create a 3D stage can be performance intensive. Quality gains from these algorithms will scale depending on how much performance one can throw at them. In addition, with features such as sound analytics or identification and local ASR, the demand for performance increases significantly. Where will the extra performance come from to support the battery-life-first and feature-rich designs?
The AON DSP as shown in the architecture of Good-enough earbuds will likely run out of oomph as the processing demands increase from simple music playback or phone calls. Why is that? Well, the AON DSP must be small to keep leakage to a minimum, as it will always be functioning at some level – see Frequency Scaling in the previous article. In simple terms, leakage is a small but constant current that passes through a semiconductor device when it is powered on and is independent of the active operation of the device. Leakage is dependent on DSP area and is generally higher in technology nodes with smaller geometries that try to push performance boundaries. Just as a small leak can sink a big ship, a small leakage current can drain a battery, albeit at a slow rate. Therefore, it behooves the AON DSP to keep leakage at an ultra-low level. It would be better to utilize a much larger portion, if not all, of the battery charge for active, i.e., useful modes of operation — rather than let it be whittled away by wasteful leakage. We can achieve this by keeping the size of the AON DSP as small as possible. In turn, the size constraint limits the number of instructions and data types the DSP includes. With such performance limitations, one can see that supporting advanced features or higher quality audio and speech processing becomes almost impossible.
So how do we overcome this? Is it the end of the road for the AON DSP?
Figure 2 shows the architecture of an AON DSP cascaded with an additional DSP.
Figure 2: Big-Little Architecture
In the Big-Little architecture, the HiFi 1 residing in the ultra-low energy domain continues to provide always-on functionality, just as in the case of Good-Enough earbuds. For these functions, which would include Wake-on-Voice, LC3 Encode and Playback, and ANC, HiFi 1 does not need the high-performance domain to be active, and therefore, would keep it in power down mode. Thus, the Big-Little architecture accrues the same battery-saving benefit as with the Good-Enough earbuds. Music listening and phone calls continue to be satisfactory in relatively noise-free environments, such as in one’s home.
Enter a public place, and the illusion of immersion is shattered. Traffic sounds, the loud diners at the next table, all overcome your senses, and the music or phone call feels distant. AON DSP overloaded, can’t keep up with the surrounding chaos?
Fortunately, this is well-handled by the more capable Big DSP. The AON DSP wakes up the Big DSP, and hands over the noise suppression task. The AON DSP continually monitors the environment with attached sensors and microphones, performs sensor fusion, and accurately classifies the present context. It only wakes the Big DSP if the context indicates the need for more noise suppression. The Big DSP handles the more challenging noisy situation, continuing to provide call and music clarity until the AON DSP detects quieter conditions and returns the Big DSP to power down mode. The separated power domains of AON and Big DSP make for frugal battery use.
When the user prefers to expand the stereo to a sound stage, or if the content indicates this, the Big DSP can again spring into action to provide an exceptional user experience. The context indication from the AON DSP may also indicate that certain sound analytics is necessary. The Big DSP can furnish the required horsepower to turn on sound analytics (e.g., when walking outside, it can deem the sound of vehicle horns as an essential sound for a safety alert).
Applications such as 3D spatialization are generally based on traditional DSP algorithms, and they take significant compute cycles. In contrast, noise suppression and ASR lean increasingly on AI techniques. For example, local ASR may extract features such as MFCCs or MFSCs with DSP techniques, followed by inferencing with a neural network trained in speech in the presence of various types of noise. The Big DSP performing both these functions would need to be adept at both types of algorithms.
Figure 2 shows the HiFi 5 DSP in the high-performance domain. HiFi 5 DSP has a rich set of traditional DSP MAC engines and specialized Neural Network MAC engines, making it eminently suitable for amalgam of DSP and AI tasks. Even as the highest performing core for DSP, AI, and mixed workloads, HiFi 5 presents a low dynamic energy profile, still allowing for long battery life while providing the features and quality demanded by the highest quality TWS earbuds.
Note that context detection in the AON domain also entails a hybrid of traditional and AI algorithms. This is in addition to KWS, which has also embraced AI. With its efficient handling of packed 8-bit data types, the HiFi 1 DSP in the AON domain is best suited to context detection and KWS tasks across the board, whether it is for Good-enough, Better, or Best earbuds.
For the Best category earbuds, the combination of HiFi 5 + HiFi 1 is the combination of choice for the Big-Little configuration.
Cadence offers a family of HiFi DSPs to suit the cost-performance tradeoffs that different products must make. When targeting Better quality earbuds, one has a choice of five DSPs for the high-performance domain, which only turn on upon demand from the user or context. The Big DSP for Better earbuds can be HiFi 1, HiFi 3, HiFi 3z, HiFi 4, or HiFi 5. These DSPs can be arranged on a graded scale by performance, allowing the SoC designer/OEM to choose the right performance-cost profile that suits their product profile. It is comforting to note the granularity available that can help tradeoff between performance, cost, and battery life.
In this architecture, the one constant across the board, from Good-enough earbuds to the Best-in-class, is HiFi 1, the AON DSP. Notably, it is also possible to eliminate the AON DSP and use DVFS or Dynamic Frequency Voltage Scaling to have the Big DSP act as the AON DSP. This is well-justified in some systems, as the HiFi DSPs are all low-power DSPs. However, a separate ultra-low power DSP, such as HiFi 1 in the AON domain, would be most beneficial to achieve the longest battery life.
All the HiFi DSPs are source code compatible and come with a rich set of optimized DSP and Neural Network libraries. Therefore, software investments made in one DSP carry over to the other DSPs. Such compatibility allows OEMs to swap out one HiFi DSP for another HiFi DSP with impunity, and thereby, target all three categories of TWS earbuds with one architecture. Utilizing the same family of DSPs also cuts development costs, as the team only needs to learn one set of tools.
Software algorithms may also span across the Little and Big DSPs if equipped with shared memory. Customers may choose their own framework to manage the communication between software components, both within a DSP and across the two DSPs. For customers that do not have their own framework, the HiFi DSPs also come with a framework for this purpose, called XAF, which can ease the development of complex Audio, Speech, and AI processing chains. XAF supports not only the Big-Little architecture but also any number of homogeneous or heterogeneous HiFi DSPs in the SoC. It is quite common for many customers to instantiate multiple HiFi DSPs in a cluster to act as the “Big” DSP. Additionally, HiFi DSPs support many commonly used Operating Systems such as XOS, FreeRTOS, and Zephyr.
In this article, we successfully scaled the design up from the Good-enough to Better and Best earbuds. We introduced the Big-Little architecture with an always-on subsystem connected to a performance subsystem. Battery life is preserved when the always-on subsystem running at ultra-low energy levels can monitor the user intentions and environment for all three classes of earbuds, while keeping the Big DSP in power-down or better still, power-off mode in the Better and Best earbuds until such time that the AON DSP detects context change where the Big DSP performance would be needed.
Single hardware and software architecture melding DSP and AI functions, a single toolchain environment, and a single development team make for happy engineering and product managers, enabling OEMs to deploy a full range of products on time and on budget.
Product Managers are among the first to perceive the speed of change in the AI landscape. The rate of migration of AI inferencing to edge devices and growth of neural networks on-device presently exceeds Moore’s law. Use cases are also multiplying. It is challenging to project AI requirements of ever-growing use cases, with any degree of accuracy, even into the near future. Products embarking on design today need to serve the market for the next three to five years. Product managers often ask themselves, how much AI do I need to keep the product relevant for this duration while not being overkill for the present?
Next time, we will look at a solution to the dilemma facing Product Managers when the AI crystal ball is foggy. See you then!
Smallest, ultra-low-energy DSP for always-on sensor fusion, always-listening voice trigger, and Bluetooth/BLE codecs