Arm Applies Cadence Cerebrus to Optimize PPA of Next-Gen 3nm Core Implementation

22 Feb 2023 • 6 minute read

The world’s insatiable demand for data and its processing is leading to more innovations in the cloud and server infrastructure space. High-performance computing (HPC) and hyperscale customers demand improved cloud workload performance from CPUs without affecting the power and area. The industry needs efficient data center servers that deliver the maximum performance for today’s complex workloads.

Arm is a leader in CPU IP for server and infrastructure SoCs. Optimizing the power, performance, and area (PPA) of an Arm-based SoC has become challenging with the growing complexity of advanced nodes. Cadence collaborates with Arm and enables mutual customers to overcome these challenges and reduce their time to market. As part of that collaboration, Arm leveraged the Cadence Cerebrus Intelligent Chip Explorer to push the performance of their latest Neoverse V2 core with reduced effort and a fast time to results.

During CadenceLIVE Silicon Valley 2022, Arm mentioned the resulting improvements of enabling the Cadence Cerebrus solution in the existing Arm implementation flow. Starting from a mature baseline, this integration enabled Arm to improve PPA (38% reduction in leakage power, 1.7% improvements in utilization, and 3.2% improvement in total negative slack) as well as team productivity.

How Does Cadence Support Arm Neoverse V2-Based SoCs?

Arm Neoverse V2 Platform uses the latest V-series core with the Arm Neoverse CMN-700 Coherent Mesh Network and several Armv9 architectural security enhancements to deliver the highest integer performance for cloud and HPC workloads. Arm observed many benefits that led them to deploy Cadence Cerebrus into their existing digital implementation flow, including ease of integration, improved PPA, and productivity improvements.

Ease of Integration

Cadence Cerebrus is easy to adopt because it wraps around the existing implementation flow (as shown in Figure 1), so there is no need to create a new flow first. It automatically manages the flow changes by injecting them at the tool command level. Then, it extracts PPA metrics for next-round learnings to further improve PPA. This helps designers customize and specify the main optimization objectives based on the design and process knowledge.

Figure 1: Cadence Cerebrus integration into existing Arm implementation flow

Improved PPA

As shown in Figure 2 below, Arm achieved a 3.2X improvement in the total negative slack (TNS), and 38% improvement in leakage power, and a 1.7% improvement in utilization was reported over baseline as compared to the regular flow by integrating Cadence Cerebrus in the existing Arm implementation flow. It is worthwhile to mention that all these improvements were obtained with a fully automated method.

Figure 2: PPA enhancement

Phenomenal Productivity Improvements

For optimizing the 3.5 million instances “main” partition, Cerebrus used around 80+ scenarios and delivered the first round of optimized data in 4-5 weeks, as shown in the figure. After the cold start, Cerebrus generates an ML model that can be used for subsequent optimizations, allowing for faster convergence and efficient use of resources, resulting in improved productivity and PPA.

Figure 3: Arm productivity enhancement with Cadence Cerebrus

As shown in Fig. 3, Cerebrus was able to converge the design in 2-3 weeks of timeframe for the next drops of RTLs, and finally, for the final netlist, the “Replay” flow was used to close the design in a week. As you can see, Cerebrus ML models significantly improve productivity, almost 50% in this case.

Cadence RAKs

Apart from other benefits, Cadence Rapid Adoption Kits (RAKs) are also used for the smooth implementation of Arm server-class platforms. To enable partners to implement data center server-class CPUs more efficiently and speed up the time to tapeout, such schemes are listed in Cadence RAKs. RAKs offer a complete digital flow to optimize the PPA design goals for many Arm cores, including Neoverse CPUs, and gain a jumpstart on the project by saving three to six months of crucial time, depending on the project specifics.

Figure 4: RAKs for Arm cores

How Arm Achieved the Best PPA and TTM

The Arm uses the Cadence Full Flow digital design platform composed of the below tools and the hierarchical flow for the CPU core development to achieve the desired PPA goals. The major Cadence tools/features that enabled Arm to achieve maximum frequency are shown in Figure 5 below.

Genus Synthesis solution for RTL Physical Synthesis
Innovus Implementation
Quantus for signoff extraction
Tempus for timing paths-based analysis (PBA) signoff and engineering change orders (ECO)

Figure 5: Cadence features

Summary

Cadence Cerebrus enables Arm and its partners to develop Arm-based infrastructure SoCs to handle the ever-changing customer expectations and market needs for an efficient cloud infrastructure solution that can manage this data explosion and complex workloads. The key contributors that lead to the success of enabling Cadence Cerebrus in the Arm 3nm infrastructure core implementation are:

Integration into the existing flow
Significant PPA improvements push a QoR and productivity gain, reducing the engineering efforts, CPU usage, and time to achieve the desired PPA.
Cadence Cerebrus enables the effective use of available compute resources compared to a traditional PPA push.