Google FeedBurner is phasing out its RSS-to-email subscription service. While we are currently working on the implementation of a new system, you may experience an interruption in your email subscription service.
Please stay tuned for further communications.
Get email delivery of the Cadence blog featured here
As you can guess from the title of this post, TSMC, Cadence, and Microsoft have been working together on signoff in the cloud. Since signoff comes later in the design cycle and is CPU intensive, it is an ideal part of the flow to make use of the cloud. It makes no sense to have in-house provision with enough resources for the peak need of signoff, since those servers will often go idle the rest of the time especially in companies that only create a few SoC designs per year.
There is a TSMC white paper on this topic. It was presented at CadenceLIVE Americas. And, on December 8, there will be a webinar on the topic. Details and registration links at the end of this post.
You might expect a blog post on signoff to discuss process corners, or voltage-based timing, or variation. But the topic of the white paper (and this post) is more on the practicalities of using cloud scalability. Cloud is always pay-per-use. But there are smart ways to do this to both reduce the number of virtual machines (VMs) while still speeding the timing signoff substantially.
There is a tradeoff between the amount (and cost) of computing power and the run time. If you are extravagant, you can consume a lot of computing power for a minimal decrease in run time. If you are too parsimonious, your run time will increase dramatically. In between, there is a sweet spot. One of the challenges with the cloud is to make sure whatever machines you pick are being fully utilized. A trivial example is that if you use a VM with a huge amount of memory but the design is not that large, then you are going to waste memory (and the money you pay to have it available).
Working with Cadence and Microsoft, TSMC has come up with two particular strategies that they call "scale-out" and "scale-in".
This strategy is to use more cloud VMs to reduce the number of runs that end up being sequential due to not having enough servers to run them all in parallel. People tend to think that using a lot more VMs in the cloud will increase cost substantially, but that is not necessarily the case. In the diagram below, if the total runtime speedup ratio A is larger than the VM count increase ratio B, designers will enjoy both cost saving and run-time reduction at the same time.
This strategy incorporates more jobs into one VM to reduce the total number of cloud VMs required. The full-blown scaleout strategy already discussed helps speed up turnaround time to the extreme, but it may not be the most cost-effective. The scale-in strategy guides designers on how to make good decisions on balancing run-time speedup against cost. In the diagram below, if the VM count decrease ratio C is larger than the run-time increase ratio D, designers will have even further cost saving while still keeping turnaround time relatively short.
A key consideration while using the scale-in strategy is the peak memory footprint required by each EDA job. Total cost of cloud runs is decided by two factors: the number of VMs used and the run time. If you become overly aggressive squeezing too many jobs into one VM, and end up exceeding total memory available, that will trigger the EDA tools to resort to memory swapping and slow down the run time excessively. In that situation, you not only sacrifice possible run-time speedup but also spend more money than needed even if using fewer VMs. Therefore, peak memory per job now becomes a primary consideration for designers to pay special attention in order to strike the right balance when doing timing signoff in the cloud.
To show how these two strategies work, and how to balance them, let's look at a sample design. This is a 37 million instance design in TSMC N5 technology with 150 timing views and 12 extraction corners. The following are key highlights from the successful execution of the scale-out and scale-in strategies:
It is also worth pointing out that there is sometimes no real advantage in paying extra to get a job run faster if nobody is going to be able to use the results. For example, an overnight run might take 12 hours and the output will be ready at the start of the following day. By paying extra, it might be possible to get the job to finish at 3am, but for practical purposes that might be the same as overnight since the engineering team is sleeping. Of course, sometimes teams make use of timezones to effectively have a team that runs 24-hours per day, in which case somebody will be up and about at 3am.
By moving to the cloud, the user can experience a significant improvement in productivity, no longer constrained by on-premises hardware limitations. By strategically applying the scale-in and scale-out strategies, users will accelerate their schedules while maintaining a high return-on-investment in cloud hardware expense. The Cloudburst Platform built on top of, and integrated with, Azure provided the connectivity model, the user interface, job control, security model, and file transfer capabilities.
The sample design on CloudBurst was handled by two classes of machine:
The basic core is the same in both grades, with the same performance. The difference is in the number of cores and the amount of memory.
Using Tempus DMMMC (distribute multi-mode multi-corner), it was found we could run three views in parallel on a single low-grade machine, or four views in parallel on a high-grade machine. This was limited by the memory footprint of each view. If all the views cannot run in parallel, the scheduler in the Tempus solution will run more views as machines become available until all the assigned views are completed.
The table below shows the same design being run on different combinations of machines. The runs circled in green are scale-in strategy; in red, scale-out; no color in between. Note that "walltime ratio" is the ratio of the fastest, most scaled-out run (with 150 E64 machines) to the remainder of the runs.
The conclusions, from the white paper, for the Tempus solution are:
The white paper goes through an analysis of how best to make use of the Quantus solution's multi-corner strategy, which operates very differently from the Tempus solution. I'll skip to the recommendations in the white paper.
Attend the upcoming webinar is on December 8. Details are on the webinar page, including a link for registration.
Sign up for Sunday Brunch, the weekly Breakfast Bytes email.