Syntiant Wins MLPerf TinyML Benchmark (with Help from Tensilica HiFi)

28 Apr 2022 • 4 minute read

Earlier this month, MLCommons announced the latest round of results for inference. Just to be clear, there were no results for training announced, I guess that will be later in the year. However, Linley Gwenapp covered some aspects of training (based on the most recent training benchmark results) so I'll take a look at that too.

Inference is divided up into four categories: datacenter, edge, mobile, and tiny.

As they put it:

Today, MLCommons, an open engineering consortium, released new results for three MLPerf benchmark suites—Inference v2.0, Mobile v2.0, and Tiny v0.7. These three benchmark suites measure the performance of inference—applying a trained machine learning model to new data. Inference enables adding intelligence to a wide range of applications and systems. Collectively, these benchmark suites scale from ultra-low power devices that draw just a few microwatts all the way up to the most powerful datacenter computing platforms. The latest MLPerf results demonstrate wide industry participation, an emphasis on energy efficiency, and up to 3.3X greater performance ultimately paving the way for more capable intelligent systems to benefit society at large.

MLPerf Training

This graph comes from the recent Linley Spring Processor Conference, from Linley's opening keynote. Note that training was not included in the recent MLPerf results, which were for inference only. Across the x-axis, it shows the various tests such as Bert or ResNet. The y-axis shows the results, which have been normalized to the throughput for 8-chip systems to enable comparisons. NVIDIA came out mostly on top, certainly when compared with the other commercial offerings (you can't buy a Google TPU v4, for example). The measurements were done before the availability of NVIDIA's Hopper H100 chip, so that is not included. The previous generation, the A100, seems to have been used as the "standard," and everything else measured against it. That explains why there is a flat line for the A100 at a normalized performance of 1.0. The purple line is Google's fourth-generation TPU, and sometimes it has higher performance than NVIDIA and sometimes lower.

Habana (Intel) and Graphcore are a long way behind, but Habana uses less power than all the others. Interestingly, an Intel Xeon can only get to about 5% of the performance of an NVIDIA GPU.

MLPerf Tiny

You can view the full results for inference divided up into the four categories:

I am going to focus on the MLPerf Tiny benchmark since that seems most appropriate for low-power on-chip inference. As MLCommons describes it:

The MLPerf Tiny benchmark suite is intended for the lowest power devices and smallest form factors, such as deeply embedded, intelligent sensing, and internet-of-things applications. The second round of MLPerf Tiny results showed tremendous growth in collaboration with submissions from Alibaba, Andes, hls4ml-FINN team, Plumerai, Renesas, Silicon Labs, STMicroelectronics, and Syntiant. Collectively, these organizations submitted 19 different systems with 3X more results than the first round and over half the results incorporating energy measurements, an impressive achievement for the first benchmarking round with energy measurement.

One of the winners was a combination of Syntiant's NDP120 along with a Cadence Tensilica HiFi 3 DSP and an Arm Cortex-M0. Since the NDP120 is designed for voice control, Syntiant did not enter any other benchmark scores.

EE Times covered the story and summarized the results:

Syntiant’s NDP120 ran the tinyML keyword spotting benchmark in 1.80 ms, the clear winner for that benchmark (the next nearest result was 19.50 ms for an Arm Cortex–M7 device). This result used 49.59 uJ of energy (for the system) at 1.1V/100 MHz. Turning the supply voltage down to 0.9 V (and reducing clock frequency to 30 MHz) reduced Syntiant’s energy to 35.29 uJ, but increased latency to 4.30 ms.

The Syntiant NDP120 is based on Syntiant’s second-generation AI accelerator core alongside a Tensilica HiFi DSP for feature extraction and an Arm Cortex-M0 core for system management. Since it is designed for voice control, Syntiant did not enter any other benchmark scores for the NDP120.

I wondered what the HiFi was doing, so I asked Prakash Madhapathy, one of our experts. He told me:

Syntiant is an innovative startup that is developing a very low-power AI SoC. They target a lot of the same application areas that HiFi targets, such as smartphones, earbuds, wearables, remote controls, drones, security cameras, and sensors. AI generally cannot work without meaningful features extracted from the raw data, speech or audio, and that's where HiFi 3 comes in to complete the picture. And they depend on HiFi for all the audio codecs as well for many of these applications.

Learn More

The website for Syntiant.

The Tensilica HiFi product page.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.