Meet the Newest Member of the Tensilica Family, the HiFi 3z DSP

25 Jul 2017 • 4 minute read

As NVIDIA's Jensen Huang pointed out a few weeks ago, and I covered in my blog post yesterday NVIDIA's Jensen Huang on Accelerating the Race to Autonomous Cars, processors are not getting any faster. Instruction-level parallelism is all tapped out, too, as every architectural trick has been tried, as I described in Are General Purpose Microprocessors Over?

The solution is to offload the general-purpose processor with something optimized to some subset of the workload. This is one reason Microsoft Azure Cloud puts an Altera FPGA on every server, and presumably why Intel purchased Altera. Google puts one of their proprietary tensor processor unit (TPU) on each server, and I've read reports that they save as many as a dozen entire datacenters by doing so.

Supercomputers typically put a large number of NVIDIA GPUs with each CPU, adding a lot of smaller cores that are still fairly general purpose, and can be used particularly for processing the training phase to create a system based on deep neural nets.

An SoC, such as a smartphone application processor, has a more constrained power budget, and also several different "subsets of the workload" that can potentially be optimized:

Graphics processing
Cellular modem signal processing
Voice recognition
Audio playback

For IoT, the power constraints are even more severe, with data sensors running on batteries for multiple years, devices that users only expect to charge weekly, up to things like Amazon's voice-activated Alexa, which has a power supply but still has to work at a very low mass-market price point.

Anatomy of an IoT Device

Each IoT device is a little bit different, but as a general rule they contain several sensors such as accelerometers, microphone, magnetometer, light sensors, and more. They are expected to communicate through one of the lower bandwidth wireless interfaces such as:

NB-IOT (narrow band IoT, which is deploying globally and is the best choice for longer distances)
WiFi 802.11ah (HaLow)
BLE (Bluetooth low-energy)
GNSS (global navigation satellite system, colloquially GPS)

Typically, the lowest power IoT devices have a low duty cycle. They are like kittens, they sleep a lot. So they will usually have some wake processing ("Alexa"), then do some signal processing of sensor data, then communicate with either the cloud or perhaps additional processing using an on-board application processor for something like a low-end cellphone.

IoT Requirements

The types of applications go from a thermostat at one end, to voice recognition at the other. So the signal processing requirements in both resolution and throughput vary tremendously. One size does not fit all, and if there is a short summary, it is that IoT requires both low- and high-precision fixed- and floating-point MACs.

HiFi 3z DSP

Today, at the Linley IoT conference, Cadence announced the latest member of the family, the HiFi 3z DSP. This provides 1.3X better voice and audio performance than the previous model, the industry-leading HiFi 3 DSP.

I won't run through every detail of the improved architecture. If you need the details, there is plenty on the Cadence website, and there is a summary in the table above. If you want to use it then it is available now.

Tensilica HiFi and Fusion F1 Architecture

Cadence's Tensilica Fusion F1 instruction set architecture (ISA) has a number of click-box configuration options. These are all pre-verified and different options are applicable for different capabilities. The block diagram shows the base ISA in green (although even that is configurable to data width). The additional blue boxes show options such as AES-128 encryption or Viterbi decode.

One option, in particular, is the potential for a single-precision floating-point unit. It used to be received wisdom that floating point was just something you didn't do on an embedded CPU. You would develop the algorithm using floating point, probably in MATLAB on a workstation, and then spend a lot of time playing with precision and alignment to create a fixed-point implementation. But floating point has become very attractive both from a power consumption and performance point of view. But, of course, the biggest attraction is the reduced time from designing the algorithm to getting it up and running a on emulator or development board.

As I said, the HiFi DSPs are a family of processors (even before considering that they are all further configurable). In fact they form the most widely licensed voice/audio/speech processors on the market. They are all software compatible, with a wide selection of codecs and audio/voice enhancement packages available from a large ecosystem of over 200 partners. The table below summarizes the family.

More details are on the Cadence website product page.

Sign up for the weekly Breakfast Bytes email: