Home
  • Products
  • Solutions
  • Support
  • Company
  • Products
  • Solutions
  • Support
  • Company
Community Blogs Breakfast Bytes Barefoot in a CloudBurst: Tempus on 2000+ CPUs

Author

Paul McLellan
Paul McLellan

Community Member

Blog Activity
Options
  • Subscriptions

    Never miss a story from Breakfast Bytes. Subscribe for in-depth analysis and articles.

    Subscribe by email
  • More
  • Cancel
CDNLive
barefoot networks
cloudburst
cadence cloud
CDNLive Silicon Valley

Barefoot in a CloudBurst: Tempus on 2000+ CPUs

9 Apr 2019 • 5 minute read

 cdnlive logo breakfast bytes Barefoot Networks gave a couple of presentations at the recent CDNLive Silicon Valley. Both presentations were about the latest member of their Tofino family of ASICs. This networking chip has a throughput of 12.8Tbps and is fully programmable. It has over 500M instances and over one thousand clocks. I'm going to cover Karthik Kannan's presentation on how they did timing signoff, with a few facts from Pradeep Nagaraj presentation on how they did test generation. Karthik's presentation was somewhat innocently titled STA Signoff of 500M+ Instance 7nm ASIC with Tempus.

CloudBurst

I say "innocently" because they actually used a product that couldn't be mentioned in the title since Cadence only announced it the day before CDNLive started; namely, CloudBurst. This is the latest offering in the Cadence Cloud portfolio. I wrote about it in my post that morning, so I won't go over all the details again. You can read the post CloudBurst: The Best of Both Worlds. However, the one-sentence summary is that CloudBurst enables companies with large on-premises (on-prem) datacenters such as Barefoot, to use the cloud to handle peak loads, as they did for timing signoff.

A pretty good summary of Karthik's talk was in the quote contained in the press release announcing CloudBurst.

We successfully ran more than 500 million instances flat using the fully distributed Cadence Tempus Timing Signoff Solution on the CloudBurst platform via AWS to complete the tapeout of our latest neworking chip on TSMC's 7nm process. This would have been impossible to achieve in the required timeframe if we hadn’t deployed the Cadence hybrid cloud solution, which offered quick and easy access to the massive compute power we needed and a 10X productivity improvement over an on-premises static timing analysis approach for final signoff.

 Here's a bit more technical detail to give you an idea of the complexity of what they were doing:

  • 2 modes (functional and scan shift) across 9 corners (PVT + RC)
  • Statistical OCV (SOCV) based STA (use of Liberty views in Liberty Variation Format, or LVF)
  • Delta-Voltage and Delta-Temperature derates
  • Wire OCV derates
  • Spatial SOCV derates (specified locations of blocks for location-based OCV derates)
  • Uncertainty values for setup and hold timing
  • Clock mesh annotations
  • Data and clock PBA (Path-Based Analysis with ‘path’ mode recalculation)
  • CPPR threshold of 1 ps
  • Tempus 17.24-s033_1
  • 1.5-year process

While the design was being done, Barefoot did a separate STA of 4 sub-designs and full-chip (FC) on a weekly basis. Based on Pradeep's presentation, the chip consists of a large number of processor blocks with complex interconnect logic linking them, which had to be tested at speed. For doing the STA of these sub-blocks (obviously a lot smaller than the whole chip) the used Tempus STA, but for the FC they used Tempus DSTA (Distributed STA). This partitions the design over multiple servers and uses multiple-threads on each server, to perform signoff. The servers were either in Barefoot's on-prem datacenters or on Amazon AWS using CloudBurst.

 A little more detail on how CloudBurst appears to the user. It is a web-based cloud environment that is set up and maintained by Cadence (as opposed to Barefoot's IT department). To get it set up requires standard NDAs between the customer (Barefoot), the foundry partner TSMC, and Cadence. Potentially there might be other IP suppliers involved too. The CloudBurst platform provides:

  • Capability for adding or releasing servers based on compute demand
  • Predefined and provisioned compute queues tailored to our needs
  • License server
  • Tempus software
  • Monitoring software
  • CMS (Compute Management System)
  • 24/7 monitoring of jobs and support and to address any environment issue

Timing runs were actually performed in chambers, where each chamber contained a read-only vault directory with library and foundry data, and a work directory to perform the timing runs. A chamber can hold up to 16TB of data, is secure, and redundant. For this particular chip, a chamber was large enough for running full-chip DSTA for 5 views. For security, a chamber could only be accessed through white-listed IP addresses which caused some issues when working from home through VPN connections, which was eventually solved by setting up a web browser on a VNC server on the internal network. Another challenge with the chambers is that they are so secure that you don't get any notification about your jobs (such as email). You can go and look and see if jobs are complete, but they can't proactively notify you.

CloudBurst contains an express FTP utility to upload data into the chamber. It is much faster than regular FTP. The design was 350GB and uploaded in four hours (vs 26 hours with regular FTP). The data is encrypted during transfer and at rest in the cloud. Subsequent uploads and uncompressing went quicker after the first upload since only delta data was transferred.

One challenge with a job on this scale is to make sure that jobs land on the right set of clients, and use the right number of clients. At the peak, Barefoot was simultaneously using over 2000 CPUs do perform DSTA of all views in parallel. They typically used one master host with 16 cores and at least 1TB of RAM, and 8 client hosts with 32 cores and at least 960GB of RAM. Each client host had two processes, and each process used 16 threads and up to half the RAM (480GB).

Results

The one-line summary is that cloud-based runs delivered an overall turnaround time speedup of 2-5X compared to local runs. If you want a true apples-to-apples comparison, you have to account for the fact that Barefoot actually had faster servers on-prem than those they chose to access in the cloud. So, if you level the playing field, the CloudBurst solution ends up delivering up to a 10X throughput advantage (with 9 corners in parallel rather than sequential). Their original plan had been to run those corners one after another before they went with CloudBurst.

More Details

The presentation should appear on the CDNLive Silicon Valley page within a week or two. For more general information, see the CloudBurst page.

 

Sign up for Sunday Brunch, the weekly Breakfast Bytes email


© 2023 Cadence Design Systems, Inc. All Rights Reserved.

  • Terms of Use
  • Privacy
  • Cookie Policy
  • US Trademarks
  • Do Not Sell or Share My Personal Information