Never miss a story from Cloud. Subscribe for in-depth analysis and articles.
Arm is at the heart of the world’s most advanced digital products; over 70% of the world’s population uses Arm processor technology. The most innovative applications, from sensors to supercomputers (Fugaku) and servers to the cloud, are powered by Arm-based chips. Many companies use cloud-based infrastructure to run their workloads for improved performance, lower costs, better resource utilization, reduced time to market, and improved quality. Arm-based servers are increasingly adopted in cloud and on-premises deployments because of performance and cost benefits.
As mentioned in AWS re: Invent 2022, Graviton3 helps run VCPUs faster and can complete the same workload with 60% less energy consumption; Graviton3 can deliver 40% more performance per dollar compared to the fifth generation X86. Arm is shifting its EDA workloads to the AWS Gravition3 after they studied the Graviton family of AWS instances’ performance across various Cadence EDA workloads. This blog post highlights the cost efficiency (server) and performance improvements of the new AWS Graviton3 CPU and the C7g instance family for Arm’s EDA (Cadence) workloads.
Arm processors have evolved since inception and usage in mobile phones. With the increased technological advancement and customer demands, Arm processors have become more powerful and capable, with larger caches and out-of-order execution.
For instance, to meet the higher data demands rising due to the growth in mobile computing, IoT, and increasingly connected devices, Arm launched various interconnects such as CoreLink CCN-502 and CoreLink CCN-512. This has extended the scalability of infrastructure compute and showcased Arm’s commitment to the flexible architecture from sensors to servers. In recent years, Arm-based servers have become increasingly popular due to lower power consumption and increased energy efficiency benefits over x86-based servers. Arm has shipped over 240 billion Arm-based processors to date, and Arm partners shipped 29 billion last year! However, designing these processors was not easy. The engineering teams faced many challenges, including:
The Cadence and Arm collaboration helps partners develop unique Arm-based designs quickly and more efficiently with Cadence and Arm joint solutions. Arm-based server demand is rising due to cost and performance benefits observed in both cloud and on-premises environments. To meet this demand, Arm partners are building for cloud consumption and marketing them for various use cases like EDA tools, edge AI, machine learning workloads, data analytics, etc. AWS is one of the partners designing Arm-based processors such as Graviton Family for servers with Annapurna Labs.
AWS helps Arm build a sustainable cloud compute infrastructure. Let us investigate some key benefits of cloud versus on-premises as mentioned below:
EDA is a combination of workloads with several tools involved. Depending on the tool being used and other factors like process technology and design size, the compute and memory requirements of an EDA flow might be varied. On-premises EDA environments typically host homogenous infrastructure, leading to suboptimal utilization of compute and storage. Attribute-based instance selection on AWS helps find the right instance for the workload. This improves resource utilization. The elasticity of AWS ensures you pay only for what you use, thereby optimizing EDA cost.
With AWS’s virtually infinite capacity, you can follow the EDA demand curve (second graph) and eliminate job wait time, which is engineering productivity loss (and engineers are the most expensive resource in a chip design project). Headcount expense can be up to 70% of project cost:
The spot instances used in AWS turn the idle capacity into productive CPU capacity and are available at less server cost (with a caveat of availability/demand).
AWS’s Graviton usage led Arm to unlimited capacity, enabled them to scale up to 350K cores seamlessly, and helped enable ML on EDA. Because of this, they could perform better verification, more verification runs, and converge on their design faster, resulting in improved quality and reduced TTM.
Arm processors offer more performance, a lower server cost, and can complete the same workload with 60% less energy than the equivalent x86-based instances.
A broad suite of Cadence EDA tools allows Arm to simulate and verify RTL. Cadence tools used at various front-end and back-end stages include Xcelium, JasperGold (now Jasper), Spectre for circuit simulation (known as SPICE), and Liberate for physical characterization. Arm uses all these tools on Graviton-powered instances from AWS and in their on-prem and x86 clusters. on-prem and x86 clusters. The AWS Graviton family offers better performance and lower costs for all tested EDA workflows. Moreover, the rate of performance improvements and cost reduction on Graviton is outpacing x86. Arm studied the Graviton family of AWS instances’ performance across various Cadence EDA workloads. Arm compared Graviton2 (C6g) versus Graviton3 (C7g) processors for this study.
It is clear that the Graviton family, moving from A1 to C6g, resulted in a 52% performance speed-up. And moving from C6g to C7g resulted in a 21% performance speed-up. Similarly, each new generation of the Graviton family has decreased the server cost. Comparing the latest generations of C7g and C6i, the Arm-based AWS Graviton3 offers a 50% cost advantage over x86.
Arm’s front-end EDA workloads tell a similar story; on Cadence’s Xcelium platform, Graviton3 improves runtime performance by 22% over Graviton2. And the total job cost improves by 12%. On Cadence’s JasperGold platform, Graviton3 runtime performance improves by 30%, with a corresponding cost improvement of 18% over Graviton2.
AWS Graviton3 shows even more significant improvement from Arm’s back-end EDA workloads, which use floating-point operations heavily. With Cadence’s Spectre Simulation, the runtime performance of Graviton3 improves by 35% over Graviton2, with a corresponding 22% cost improvement. And on Graviton3, Cadence Liberate Characterization performance improved by 33% and cost by 21% over Graviton2.
For each comparison, C6g.16xlarge instances represent AWS Graviton2, and C7g.16xlarge instances represent Graviton3.