Never miss a story from Breakfast Bytes. Subscribe for in-depth analysis and articles.
At the recent TSMC OIP Ecosystem Forum, there were two special presentations by TSMC "before" the usual partner presentations. I put "before" in quotes since the event was virtual, so in practice, you could watch stuff in any order, unlike when OIP has been in-person in the past.
The first of these was by YK Cheng, Director of Design Solution Exploration and Technology Benchmarking at TSMC. His presentation was on TSMC's N3 (3nm) process for HPC, full title N3 HPC Design and Technology Co-Optimization (DTCO). The second was on 3D packaging, which I will cover in a separate post next week.
First, let me explain DTCO since it is a relatively new concept. I have written about it in:
In the past, when Moore's Law was in full swing, we would scale the process without really looking at design. Then throw the design rules over the wall and let the standard cell and memory designers have at it. That doesn't really work any more since the designers of the process now have to take account of the impact of their decisions on the designers. Something that is expensive in the process might make a huge difference to the designers and thus the size of the eventual design, which is what people really care about. For example, via pillar is difficult to manufacture but it can make the routed design a lot smaller. Until recently, the focus of DTCO has been on what features in the process are required to remove another track from the standard-cell library. So now you know what DTCO is, read on...
The TSMC N3 (regular N3 that is) offers about 10% performance improvement over N5. N3 HPC offers a 3% block-level speed improvement over N3 but then an additional 9% speed boost by HPC DTCO. So a total of 12%. The test design is an Arm Cortex-A78.
So what are the knobs that TSMC could turn? Starting at the base, with process improvements with larger contacted-poly-pitch (CPP) and taller cells. On top of that improved BEOL (metal) and super-high-density MiM (metal capacitor) with selected metal pitch combinations and an MiM design integrated IR/EM signoff flow. On top of that, a series of HPC optimized cells giving faster flip-flops, double height cells, and cell making use of via pillars. On top of that, there are knobs to optimize place and route (see below).
YK went into a little more detail on the four levels in the triangle in the diagram above.
Process enhancements (larger CPP and taller cells) gives a 10% speed gain (at same ower) over existing HC cells.
HPC-centric BEOL design coping with longer interconnects and corresponding wire-delay is often a big challenge. In mobile, a minimum metal pitch is used due to the need for density scaling. However, HPC applications often call for a larger metal pitch (lower RC) and a larger via (lower resistance). TSMC created special metal pitch combinations and design rules to have a good tradeoff for PPA. The result is a 2-4% gain in performance.
MiM is essential in HPC designs to prevent voltage droop and improve performance so TSMC created a super-high-density MiM to both have good density and good frequency response. This results in less voltage droop leading to about 3% speed gain.
Next, YK discussed the standard cell library changes with architecture and layout optimization. These result in about 2% speed gain with about 35% of HPC cells being used on critical paths. There are over 400 new HPC cells.
The changes in the library consisted of:
The focus so far has been on higher speed, but with additional knobs they can use to get lower power too, using DTCO. TSMC can maintain about 10% faster speed but with smaller area, reducing power by 15%. Area reduction obviously helps logic density, but also helps speed due to shorter wires (reduced R).
For HPC designs, the power distribution network (PDN) is becoming more and more important. It is key to reducing IR drop, and so, improving speed. It is also key to maintaining a good balance with P&R resources. TSMC has developed a special design flow. First it allocates power and ground in a more clustered manner, which makes room for signal routing with less obstruction. Clock net routing performs better with reduced skew which results in better performance. Next, in a sort of cleanup phase they insert power and ground wires and vias wherever possible after P&R. This results in about 1% speed gain.
Interconnect delay, both wires and vias, accounts for a lot of the overall delay in advanced technologies, especially for HPC. An efficient EDA flow is needed:
The result of these P&R solutions provides about 1.5-2% speed gain.
Sign up for Sunday Brunch, the weekly Breakfast Bytes email.