Get email delivery of the Cadence blog featured here
Digital IC implementation has been at the core of EDA for many years, but it is far from a solved problem. Today, advanced nodes such as 16nm and 14nm FinFET are demanding a new generation of powerful IC design tools. Existing technology is not sufficient to meet the challenges of advanced process nodes, the size and complexity of designs, and the insatiable demands for power, performance, and area.
In this interview Anirudh Devgan, senior vice president of the Digital and Signoff Group at Cadence, talks about key IC implementation challenges, new concerns at 16nm and below, the impact of FinFET transistors, and new approaches that are needed for placement, routing, and clock tree synthesis.
Q: Anirudh, what are the biggest IC implementation challenges that designers are seeing today at advanced nodes such as 16/14nm and 10nm?
A: I see three key areas of challenge. The first is that design sizes are getting huge. Not long ago, “large” designs were in the 20M- to 30M-instance range, but advanced-node designs today may be over 100M instances. With traditional place-and-route tools, you will have too many small blocks—perhaps 1M to 2M instances each—to deal with. This takes too long and it impacts TAT [turnaround time]. In fact, the need to handle large design blocks with rapid TAT is growing even at established 28nm and 40nm process nodes.
A second challenge is that the blocks have tight PPA [power, performance and area] requirements. Demands for PPA are getting more intense. Instead of designing at 500 MHz, many engineers today are designing at 3GHz or 4GHz. Also, power envelopes are getting smaller and power is getting more difficult to control.
A third challenge is that a lot of tool flows are fragmented between place and route and signoff. Different engines are used to handle IC layout and timing analysis, resulting in a lack of consistency and a lot of iterations. Optimization focuses on individual tools, not the whole flow. We need much more integrated flows for IC layout, timing and power analysis, and RC extraction.
Perhaps the biggest overall challenge is meeting stringent PPA and TAT demands for SoCs. Otherwise, a tug-of-war will develop over what appears to be conflicting goals. If you’re designing SoCs for high-end applications, you can’t afford to sacrifice PPA or TAT.
Q: How can IC implementation tools help SoC designers achieve their goals for both PPA and TAT?
A: To optimize TAT, we need massively parallel architectures that can handle huge designs. We need to take advantage of multi-threading on multi-core workstations, as well as distributed processing over networks of computers. For example, digital implementation systems should be able to handle blocks with up to 10M instances. Customers need full-flow speedups up to 10X over existing solutions.
To improve PPA, we need new approaches to optimization. We need full-flow, multi-objective technology that enables concurrent electrical and physical optimization. Timing-driven and power-driven optimization needs to be multi-threaded and layer aware, and be able to reduce dynamic and leakage power while still providing optimal performance.
Individual components of the digital implementation flow need to comprehend the complexities of the downstream flows that follow. Placement needs to be integrated with timing analysis and optimization. Placement and routing must be optimized to avoid congestion. Concurrent clock and datapath optimization is also needed—today, they are run separately.
Q: Designs have always gotten more complex as process nodes shrink. Are there other concerns that are unique to 16/14nm and below?
A: Yes. At 20nm and below, wire dimensions and lithography start to reach their limits, and digital designers must employ double patterning or even triple patterning on the interconnect between transistor layers. In addition to consuming more mask layers, double patterning results in additional design rules and complicates layout verification. Layout designers use colors to determine which features go on which masks, and any IC implementation tool used at 20nm or below must be “color aware.”
At 28nm and 20nm, wire delays dominate timing over gate delays, because wires haven’t scaled with the transistors. Designers are using local interconnects, but they requires new layers, rules, and connectivity models to manage. Also, at smaller nodes, devices generally have higher leakage current even in the off state, so the total power dissipated can be much higher than expected.
There are other concerns. Smaller nodes have about 1,000 new design rules to address, and more than 400 new advanced layout rules for the 1X layers. You also need to account for variable thickness in the metal stack and increasing wire resistances that emerge at higher level metal layers.
Q: What happens with multi-corner timing analysis below 20nm?
A: The number of corners explodes, making MMMC [multi-mode, multi-corner] timing closure more complex and complicating ECO handling. If you consider that a “view” is a combination of an operating mode and a PVT [process, voltage, and temperature] corner, such as worst-case temperature, there could be over 1,000 potential views in an advanced-node SoC. In reality no one is going to analyze that many views, but techniques like massive parallelism and multi-scenario acceleration are needed to speed the process.
Q: Most processes below 20nm use a new kind of transistor called a FinFET. Does this add new IC design challenges?
A: Yes. The 3-dimensional FinFET transistor goes hand-in-hand with double patterning, adding to the process complexity. It introduces tighter restrictions on cell placement for fin alignment and pin access. FinFETs provide much higher drive strength than planar; improve performance, area, and layout; and have very good leakage characteristics. The routing, however, has more RC per distance, increasing dynamic IR drop and making optimization much more important.
Q: Power reduction has been a problem for many years. Is further work needed for advanced nodes?
A: Methods for reducing dynamic and leakage power must be re-examined at every process node. What works for one node may not work for another. For example, FinFETs have inherently good leakage control. However, as performance in FinFET devices gets pushed to the limit, you’ll need to carefully manage leakage power to avoid increasing the power dissipation.
Q: Looking specifically at placement, what are the current limitations and what needs to change?
A: We really need to change the way placement is done in order to improve PPA. Today placement is color aware, but is only lightly integrated with engines such as timing analysis and optimization. What’s needed is a much tighter integration with other engines in the IC implementation flow. Besides being aware of pin access and patterning, placement needs to be aware of routing and timing impacts.
Q: How can routing help maximize PPA?
A: Routing congestion is a big problem with advanced-node ICs and is not easy to resolve. Getting the best possible routing starts with good floorplanning and placement. It also depends on layer assignment, since wire delays are considerably less on the upper metal layers. You can get a big timing gain by routing long and critical nets on the upper layers, but routing resources here are limited.
A route-aware optimization capability should be able to identify timing-critical nets, avoid congestion on the upper layers, and rebuffer nets to improve timing. Routing optimization should also uncover signal integrity issues before detailed routing.
Q: Does clock tree synthesis need to improve?
A: Traditional clock mesh and tree structures have limitations when it comes to closing critical timing paths at the desired frequency. That’s because they are optimized for lower frequencies to balance out for zero or fixed skew.
Q: Finally, will better digital IC implementation tools encourage more use of advanced-process nodes?
A: There will always be those who stick with mature process nodes for cost reasons. However, having a tool flow that’s up to the task is absolutely essential for designs at 16/14nm and 10nm. Advanced techniques such as massive parallelism and concurrent electrical/physical optimization will make it possible to construct integrated flows that allow engineers to meet both PPA and TAT requirements, making a tug-of-war between conflicting goals unnecessary.
- Q&A: Anirudh Devgan Discusses New Cadence Signoff Strategy
- Taming the Challenges of IC Design
- Designer View—How GigaOpt in Encounter Digital Implementation (EDI) System 13.1 Boosts IC Design Quality