Tensilica FloatingPoint DSP Family

29 Jun 2021 • 3 minute read

Recently, Cadence announced the availability of the Tensilica FloatingPoint DSP family. I expect you thought that Tensilica already had floating-point DSPs, and it is true that the existing Tensilica processors have an optional floating-point unit. However, the new family is optimized for floating-point processing with improved power, performance, and area (PPA), and is suitable for a wide range of modern floating-point-centric applications with an easy-to-use programming environment.

Historically, DSP algorithms were developed in floating-point. If you go back a couple of decades, this would be done on a workstation, and then the algorithm would be optimized for fixed-point, which is all DSPs supported back then. When floating-point units first were added to DSPs, they were slow. Due to being several generations back in Moore's Law, they had to be iterative so were still a lot slower than fixed-point.

But modern floating-point units are fast and efficient, so there is no longer any pressure to convert to fixed-point for the "real" implementation. Today, floating-point is used in a wide range of applications such as drones, earbuds, virtual reality, video games, and high-performance servers. For that reason, existing Tensilica processors have had a floating-point option. But for people who are using floating-point only (for the values being processed, obviously things like loop indexes will still be fixed-point aka integers) this solution was not optimal. Until today.

The New FloatingPoint Processors

In April, we announced two new Tensilica Vision DSPs, which I wrote about them in my post Tensilica Vision Q8 and P1 DSPs, More AND Less. In that post, I drew the following picture to show how the portfolio of Vision DSPs spread from low performance to high performance.

Well, here is the new Tensilica FloatingPoint DSP product line:

The four processors differ a lot in their capabilities, just like their non-floating-point cousins. But they are all optimized for floating-point algorithms, without also having all the optimization for fixed-point algorithms that is in the processors we announced in April.

So now there are three different ranges of processors, depending on the requirements:

Fixed-point only: P1, P6, Q7, or Q8 with no floating-point option
Intensive fixed-point and floating-point: P1, P6, Q7, or Q8 with floating point option
Floating-point only: the new KP1, KP6, KQ7, or KQ8

tensilica floatingpoint block diagram The PPA across the range vary. But in other ways, all four processors have similar characteristics (actually, there are slight differences, see the table at the end of this post):

IEEE 754 vector floating-point
- Half-precision
- Single-precision
- Double precision
Enhanced FFT, convolution, and matrix operations
Vector divide, RECIP, RSQRT, SQRT
Xtensa LX Secure Mode

The result with Tensilica FloatingPoint DSPs compared to a fixed-point processor with a floating-point unit is:

40% smaller since area is optimized for floating-point workload without fixed-point overhead
25% improved FMA operations (this is the floating-point equivalent of MAC operations in fixed-point, the heart of any DSP algorithm)
30% lower energy consumption

Software

There are many different sources for the software for a DSP like this, with vector operations and an underlying VLIW architecture. For example:

Nothing exists, the software is being written from scratch
A scalar (non-vector) implementation exists
An existing implementation exists, probably making use of common libraries such as the standard C math library or the widely-used open-source Eigen linear algebra library

The software toolchain for the Tensilica FloatingPoint DSPs can handle all of this automatically. In particular:

Instruction bundling and scheduling to take vanilla sequential code and optimize for VLIW architecture
Auto vectorization to take scalar code and adapt to take advantage of the vector operations available
Support for LLVM datatypes
Exactly the same code will run on any of the four FloatingPoint DSPs, making it easy to move up for more performance, or down for more aggressive power and area optimization
Availability of floating-point optimized libraries such as our NatureDSP and SLAM libraries, or other publicly available libraries

This optimized toolchain provides straightforward software migration to move implementations into these new processors, the capability to develop the highest performance software, and accelerate time-to-market.

Summary

Four new Tensilica processors optimized for floating-point algorithm implementation, with outstanding PPA, high-performance, low-energy, and small area.

There is an easy software development environment with superior auto-vectorization, and instruction building and scheduling, and optimized floating-point libraries.

tensilica software capabilities

So the best processors and the easiest migration. Learn more on the Tensilical FloatingPoint DSP Family product page.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.