Get email delivery of the Cadence blog featured here
In July 2011 Cadence purchased Azuro, a provider of "clock concurrent optimization" technology (ccopt, pronounced "c-c-opt"). Going far beyond traditional clock tree synthesis, this technology combines "useful skew" clock tree synthesis with timing-driven placement, logic re-sizing, clock gating, and post-clock tree optimization into a single step.
As Paul Cunningham, Azuro co-founder and former CEO, explains in this interview, clock concurrent optimization represents a significant leap forward in digital implementation for ICs. The ccopt technology is available immediately to Cadence Encounter Digital Implementation System users. Cunningham is now senior engineering group director at Cadence, where he oversees the former Azuro R&D team. Further information about Azuro and ccopt is available in a previous blog post.
Q: Paul, let's start with some background. When was Azuro founded, by whom, and why?
A: Azuro was founded in 2002 by myself and Steev Wilcox. Both of us had PhDs in asynchronous design. Since asynchronous design is about making chips without any clocks, it forces you to think about why you need clocks. There's nothing like getting rid of clocks to focus your mind on what they were there for in the first place.
Our journey led us to the conclusion that three key industry trends -- on-chip-variation, clock gating, and clock complexity -- were compounding to cripple the fundamental principle on which CTS tools are based, which is that tight skew implies convergence between pre and post CTS timing in the flow. So we needed to rethink clocking from the ground up as a new technology we call clock concurrent optimization.
Q: How do you explain clock concurrent optimization?
A: Clock concurrent optimization is the idea that the clock schedule and logic delays must be optimized together and in harmony, instead of doing one and then the other. There is a strong parallel with this idea to what happened 10 years ago with physical synthesis, when people realized that we're going to have to do placement and logic sizing concurrently. Similarly, we're saying that it no longer makes sense to skew-balance clocks and then tweak the logic delays afterwards in a separate step. Clocks have to be timing-driven, and as soon as your clock tree synthesis becomes timing-driven, you can't separate it from placement and you can't separate it from logic sizing, so you have to go to clock concurrent optimization.
Q: Where does clock concurrent optimization fit into the design cycle?
A: It replaces both the traditional clock tree synthesis step and the traditional post-clock tree synthesis step with a single unified step. But the idea of clock concurrence, in my view, spans the whole flow, right from the way you do front-end design down to post-route optimization. What we did for Azuro was just a beginning. One of the exciting things about the Cadence acquisition of Azuro is that we can now take the clock concurrent concept and expand it throughout the digital implementation flow.
Q: What power, performance and area benefits have you seen with ccopt?
A: We have seen total power reductions up to 10%, performance improvements up to 100MHz for a GHz design, and clock tree area reduction up to 30%. The actual benefits a particular user sees is a function of the effort that they have put into whatever we are comparing against. Sometimes customers spent six months fine-tuning everything by hand. In that case maybe we show only a small power or performance difference, but we can achieve this without six months of manual tuning, and we can achieve this pretty much "out of the box," which is a significant benefit in and of itself.
Q: You use a concept called "useful skew." How does that work?
A: Useful skew is the idea that you don't need to schedule the arrival of clocks simultaneously across all registers. To use an analogy, the traditional approach to skew is like saying you're going to work eight hours every day even if you've got nothing to do. The vision that we have is that you're going to work a 40 hour week, but you don't have to work the same number of hours each day. You get the job done but don't have to get it done in a rigorous fashion.
But the way useful skew is implemented in traditional flows is like first assuming you're working eight hours a day, and then afterwards doing everything you can to juggle your schedule on this assumption. At the very end of the day, as a last resort, you do a little skewing where you can't quite make it. That's not the same thing as just saying right from the get-go that I'm going to redo my whole timetable process on the assumption that there is no requirement to work 8 hours a day, just 40 hours a week. Now obviously you can't do this unless you also mess around with what is on the agenda for each day, at the same time as messing with how many hours are available each day, which is why the clock scheduling and logic optimization become inseparable. That is what I mean by clock "concurrent" design.
Q: You also emphasize the importance of propagated clocks versus ideal clocks. Why?
A: Before you build the clocks, there are no clocks. So, you have this magic "ideal" model of a clock that arrives at all the registers at the same time within a global static margin of uncertainty. After the clocks are built you change this model into a true "propagated" model of clocks where you really time the delays through the actual clock network. In this propagated clocks world, you don't need a static margin for all registers - the margin can depend on exactly which registers and which logic function you are looking at.
The difference between ideal and propagated clocks margins is becoming very significant, and there is no point doing clock concurrent optimization based on ideal clocks. But again this creates another chicken and egg problem; we don't know how best to build the clocks until after we have built the clocks! This is another reason why "concurrent" is such a key word for this technology.
Q: How does the Cadence acquisition benefit Azuro and its customers?
A: The ccopt technology we developed is part of place and route. Customers would much prefer to use a fully integrated clock concurrent place and route solution with the ccopt capability inside it. Ultimately, at Azuro we were a bunch of very bright guys selling an engine to people who wanted to buy a car. At Cadence we now have the entire car to work with. The integration of our engine makes every part of the car faster, more efficient, and provides a much smoother overall driving experience for low power, high-performance applications. We have the right technology and the right business model....finally!
So the next question is, why Cadence? The number one thing for me is that I think Cadence has the right vision with EDA360. The number two thing is that Cadence's management team truly embraces this vision and is executing on it aggressively.
Q: It sounds like clock concurrent technology has the potential to reshape the IC physical implementation flow. Would you agree?
A: Yes. I think clock concurrent design is going to be as big for EDA as physical synthesis was 10 years ago.