System in Package, Why Now? Part 2

11 Dec 2019 • 6 minute read

This post is a continuation of last week's post Multiple Die in Packages. Why Now? That post looked at several of the drivers for system integration increasingly being done using 3D packaging technologies rather than integrating everything onto a huge SoC.

Heterogeneity

Another motivation for separate die is not just to split up a design in a single process, but to package die from different processes.

Sometimes there are economic reasons. Several presentations at HOT CHIPS had a partition of the design into the processor itself, and an I/O part of the design. The processor could be manufactured in the most advanced and expensive node, and the I/O in a less advanced and cheaper node (typically, it seemed, one generation behind). The image at the start of this post is Intel's Lakefield, with a base I/O die (in a non-leading edge process, I think 14nm), the processor in 10nm, and in-package DRAM on the top. This is all assembled using Intel's 3D approach that they call Foveros.

The reason for doing this is two-fold. The most obvious is that the I/O interfaces don't benefit from the more advanced node. And in the modern era, advanced nodes are more expensive per transistor, so the economic push is to hold back, not move the advanced node as aggressively as possible. But there is also a second more subtle reason. All the I/O (and other routine blocks) have already seen silicon, either in production or at least in test chips. If the I/O die is also done in the most advanced process, then test chips for things like high-speed SerDes become part of the critical path to getting the whole system out.

RF and analog benefit even less from being in the advanced node. In fact, not only do they not benefit, it is a positive disadvantage. It is very difficult to design analog circuits in FinFET processes. The reason is that FinFETs are quantized. Transistors have a uniform and fixed length, and the width is an integer number of fins. In planar processes, the analog circuit designer could pick the widths and lengths of the transistors. Often in an analog design, what is most important is the ratio between the sizes of critical transistors. But in FinFET you can't have two transistors with an arbitrary ratio like that, so analog design. It makes much more sense to keep analog design back in a planar process like 28nm, or perhaps even a less advanced node such as 65nm where perhaps the design (an ADC say) has already been well characterized and seen high-volume production.

I'm not an RF expert, but I understand that it is next to impossible to design RF in FinFET processes due to the high-capacitance of the FinFET transistors themselves. It's possible that the high resistance of the interconnect is also an issue for RF.

Another area where it can be attractive to use separate die is for photonics. Even if some of the photonics is on the main die, it is unlikely that the lasers themselves can be. Usually they are InP (indium-phosphide). As it happens, the Intel keynote at Cadence's recent Photonics Summit was on building two die solutions and then attaching the two wafers face to face. (See my post The Photonics Summit 2019: Hybrid Lasers.)

At HOTCHIPS, Ayar Labs presented their TeraPhy, which is a small optical chip that can be added into the package for an SoC to provide optical connectivity. See the diagram alongside.

Chiplets

So far the assumption in all the discussion about 3D designs with multiple die in the package is that the die are all designed by the same team, or at least the same company, with the exception of DRAMs which always come from specialized DRAM manufacturers. DRAM has to be manufactured at scale to be competitive, and "at scale" means a whole fab at a time.

But there is another possibility, which is that in-package components become available commercially. These are known as chiplets. There are several challenges to this. There are some technical ones, but they are the same as for all the other in-package integration that I've already discussed. But there are two further challenges, standardization and market. In fact, Cadence is involved in a program addressing some of this. (See my post ERI: CHIPS and Chiplets.)

If the same team is designing two die that have to go in the same package, they can pretty much choose any communication scheme they choose. But if the chiplets are standard in some sense, for example, a high-speed SerDes chiplet, or a WiFi chiplet, then the SoC has to use whatever interface the chiplet provides. To keep things simple, it is better if the interfaces are well-proven and standard. Inside a package, the distances are short and so it doesn't make sense to use the same type of long-reach SerDes that would be appropriate to run across a backplane. Another advantage inside a package is that it is relatively cheap to have a lot of connections compared to running through a package onto a board (for example, wide-memory can have thousands of connections instead of trying to get all the data across in eight or nine lanes).

As it happens, Cadence just announced the UltraLink D2D PHY IP and a test chip (or test chiplet) to demonstrate it in silicon. (See my post Die-to-Die Interconnect: The UltraLink D2D PHY IP.) This has our 40Gbps SerDes. It has been designed to be very low power, and also maximize connectivity across the edge of the chiplet (sometimes called beachfront) without requiring expensive manufacturing processes due to very tight pitches.

The dream of proponents of the chiplet approach is that a marketplace for known-good-die chiplets comes into existence, and so just like you can purchase HBM in the open market, you will be able to purchase a wide range of chiplets. Design becomes more like board-level system design, with purchased standard components, and perhaps a single SoC designed as the heart of the system.

I'm a bit skeptical that this will happen, the problems of inventory seem hard to deal with. When I was at VLSI Technology we were always challenged by gate-array base inventory. The promise of a gate-array design is that the bases are all pre-diffused and held in a wafer bank. That worked fine for simple designs in very low volume. It was a hard tradeoff. Any wafer sitting in wafer bank is money tied up and depreciating (and, if a new process generation is coming up, perhaps becoming obsolete). On the other hand, the promise of gate-arrays was that wafer bank would be available, and the turnaround time for an order would be short (in those days, just adding three layers of metal to the banked wafer). And that's before you consider that we needed a base wafer with various ratios of memory to gate fabric.

But the value proposition would be:

Flexibility in picking the best process node for the part—in particular, SerDes I/O and analog does not need to be on the "core" process node
Better yield due to small die size
Shorten IC design cycle and integration complexity by using pre-existing chiplets
Lower manufacturing costs by purchasing known-good die (KGD)
Volume manufacturing cost advantage when the same chiplet(s) are used in many designs

The first couple of bullets are the same for any system-in-package solution. The other three are highest if you can simply buy chiplets from a distributor, but they are also mostly true if the chiplets have to be manufactured especially for the particular system. The promise is that you can design systems like this, a 25.6Tbps switch with 112G SerDes chiplets, as opposed to having to integrate all the SerDes interfaces onto the big core SoC itself.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.