ST's Experience with Cadence Cerebrus

20 Jan 2023 • 3 minute read

At CadenceLIVE Europe back in Thanksgiving week, one of the presentations was by Olivier Uliana of STMicroelectronics titled Cerebrus PPA Optimization on the Next-Generation High-End Microcontroller CPU Core. In case you've forgotten what Cadence Cerebrus is, you can see my post when we announced the product Cadence Cerebrus - Intelligent Chip Explorer. Or here is the value proposition off the product page:

The Cadence Cerebrus Intelligent Chip Explorer is a revolutionary, machine learning-driven, automated approach to chip design flow optimization. Block engineers specify the design goals, and Cerebrus will intelligently optimize the Cadence digital full flow to meet these power, performance, and area (PPA) goals in a completely automated way. By adopting Cerebrus, it is possible for engineers to concurrently optimize the flow for multiple blocks, which is especially important for the large, complex system-on-chip (SoC) designs needed for today’s ever more powerful electronic systems. Additionally, through the Cerebrus full flow reinforcement learning technology, engineering team productivity is greatly improved.

The Design Used

So what was ST's experience? The test vehicle was:

Sub-20nm technology
580K instances
28 macros
Multi-Vt/PB standard cells
FBB (forward body bias)
AVS (adaptive voltage scaling)
LVF and POCV margins

So not a billion transistor chip but still pretty sizable. And it came with the usual set of implementation challenges:

RC effect is a major contributor to timing
Several dominant timing corners due to body bias and voltage compensation
Choice of dominant corners is key for multi-corners implementation
Setup/hold conflict is complex due to RC effect and usage of useful skew
Xtalk effect becomes an important contributor to timing
Context-based timing requires aggressive spacing cells abutment rules
Electromigration rules require specific prevention on clock and data paths
Persistent margin (synthesis/placement/cts/route) used for signoff timing correlation

Cadence Cerebrus

ST's expectations were to:

Use Cadence Cerebrus machine-learning to optimize the core PPA much more than the traditional manual flow
Push the frequency to the target and optimize the leakage and area
Optimize the productivity by reducing the time to achieve the PPA target

A key aspect of Cadence Cerebrus is that the user gets to choose how to weight the various attributes of the design to be optimized (for example, is timing more important than area?). The actual metrics to be optimized are a bit deeper, namely:

Timing setup worst negative slack (WNS) and total negative slack (TNS)
Timing hold wns/tns
Power switching/internal/leakage
Design congestion/density

These metrics can be weighted by the user, although by default they all are 1 (so equally weighted). As Olivier put it:

cost-weight adjustment is key to optimize your objectives

Another part of setting up Cadence Cerebrus is defining the resources that will be made available (basically, what servers Cadence Cerebrus can use). ST used:

16 maximum parallel jobs
16 CPUs per job
243 scenarios explored
16-day runtime
disk space x20

Obviously, if you look at those numbers, the Cadence Cerebrus exploration request is high in resource usage and is time-consuming. But in some ways, that is the point: the idea is to replace limited computing and a lot of engineering time with computers.

PPA Results

st's best results

The implementation had 12 PVT corners for timing and power optimization. The base run (manual) took three days for one iteration, whereas Cadence Cerebrus ran for 16 days. But Cadence Cerebrus produced PPA improvements against the base run of +1% frequency, -22% leakage, and -3.5% area.

By changing the weights and focusing on different specific parameters (at the expense of other parameters, of course), ST could get:

Up to 6% frequency (versus base run)
Up to 20% area reduction
Up to 22% leakage reduction
Up to -100% setup TNS reduction
Up to 90% hold TNS reduction

replay results Another aspect of Cadence Cerebrus is model replay. Having extracted the best scenario from Cadence Cerebrus, that scenario can be used to apply the best recipes and primitives on a similar database without having to do a full exploration, so it saves a lot of runtime. That produced an improvement of -21% leakage, -7% total power, and -0.5% area.

Summary

Cadence Cerebrus flow can easily be plugged into the existing Innovus flow
Computing resource needs must be anticipated
Cadence Cerebrus ML provides a significant and predictable gain on CPU PPA
Cadence Cerebrus model replay is efficient to optimize the final run without launching another full exploration

Learn More

See the Cadence Cerebrus Intellligent Chip Explorer product page.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email