Designing for Low Power… Begin at the Beginning

1 May 2017 • 3 minute read

So you have your RTL written, and it’s time to optimize to reduce power. If that’s your plan, you are likely leaving power on the table. It’s not that you can’t get a lot of savings with existing RTL synthesis tools (you certainly can!), but the biggest bang for the buck comes from early design decisions.

Next time… start sooner!

In fact, experts (not me, real experts) estimate that optimal architectural (pre-RTL) decisions can reduce power by 80% or more. Stated as the inverse, architectural decisions that are poor for power can lead to 5X greater power consumption than more power-efficient architectures.

The problem is that architectural trade-offs are difficult to evaluate, especially when it comes to power. Designers often have a good intuitive feel for the performance and area impacts of decisions they make when writing RTL, but that is not as true when it comes to power. Typically, power is only measured at RTL or gate levels, and by then it is not realistic to consider changing the architecture. (For more on that, you can check out my blog from earlier this year lamenting about the realities of hardware design.)

So, what’s a designer to do?

At last week’s 2017 International Symposium on VLSI Design, Automation and Test (2017 VLSI-DAT) in Hsinschu, Taiwan, Cadence’s Tung-Hua Yeh presented a paper titled “High-Level Low-Power System Design Optimization.”

He proposed a methodology where the designer can create multiple high-level architectures in SystemC, and then use HLS to synthesize each architecture to multiple RTL implementations with different micro-architectures. Each implementation can be quickly evaluated in terms of performance, area, and even power when using state of the art tools. This allows the designer to use quantitative analysis for evaluate architectural decisions, a significant upgrade from blind guesses.

In its most vanilla form, quantitative analysis can be used as a pure “shotgun” approach. In other words, generate many points, and hope to find an interesting one. However, it can also be used with a more directed approach, which is where the value of quantitative analysis can truly shine.

In the directed approach, you are looking at one implementation at a time (perhaps starting with one from the “shotgun blast”). You then dive deeper into the power analysis of that design to find where power is being dissipated and what is causing that. The results are linked back to the original code so you can change your high-level algorithm or constraints, and then iterate from there.

Tung-Hua walked through two examples of applying quantitative analysis. The first was a public 2-D IDCT example for which they generated 61 different viable implementations. Power and area estimates were generated for each implementation. (Performance was a constraint. Target throughputs, latencies, and clock speeds were given. If an implementation could not be generated that met the performance requirements, it was excluded.) Each implementation was evaluated for energy-efficiency (energy/block).

He found that the most power-efficient architecture varied with the overall performance constraint (samples/second). Depending on the constraint, one of three architectures was best. Interestingly, he also showed that the most energy-efficient implementation for some throughputs was the least energy-efficient at high throughputs. (This architecture is shown with the blue triangles below.)

Admittedly, that is just a fairly small IDCT, so he included a discussion of a real-world proprietary software-defined radio application, as well. By using the methodology of quantitative trade-off analysis, including directed exploration, this application was significantly smaller and more power-efficient than originally budgeted. This methodology is credited with reducing the area of the design by >25% with a 4x improvement in power.

That is especially impressive when you realized the estimates are based on the previous-generation hand-written RTL.

By the way, I started this off by saying that optimizing the RTL is too late to maximize your impact on power. Needless to say, if you are starting power optimization even later with a netlist or placed gates, you are definitely leaving a lot of power on the table. And if you run into a power problem, it will be much harder to solve.

When do you typically start optimizing for power in earnest? Let me know in the comments or by dropping me an email.