Linley: Chiplets for Infrastructure Silicon

9 Nov 2022 • 5 minute read

At the recent Linley Fall Processor Conference, R. "Suds" Sudhakar of Cisco gave the keynote titled Chiplets for Infrastructure Silicon: Hype, Hope or Hazy. First, just think about that: chiplets are so hot right now that even at a processor conference, one of the keynotes has very little to do with processors and is all about advanced packaging. Suds admitted that Cisco hasn't talked much about this topic until very recently.

The main part of the presentation was a history of Cisco's use of chiplets which goes back 10 years. It even built its own custom DRAM chiplets because bare die DRAM chiplets were not a thing. This was before HBM (HBM1 was first introduced in 2013).

A graph depicting tyranny of yield Suds started out with a dozen slides that laid out the argument for why this is all happening now, as opposed to in the distant past (or the far future, too, I suppose). The diagram on the right shows what he called "the 1964 tyranny of yield." If you want to build a really big function (more than one chip), then you have to decide how to partition it. If any of the chips are too big, they don't yield, and if all the chips are too small, then you have a high cost to manufacture them all and connect them all. This is the basis of Moore's Law, that there is an optimal size for a die (from an economic cost point of view) and, over time, that optimal point is moving as semiconductor manufacturing advanced. The fact that Dennard scaling meant that the power did not increase made this a driver of the industry for decades, and the optimal size of the chip (in transistors) just kept going up.

A graph depicting crossover In 2015-2025, Suds says we are the crossover point where it is more economical to chip large die into smaller ones. You get a higher yield (defect-related), more die per wafer, and more mask-field utilization. Since analog scales poorly, the optimal point can be slid further down the cost curve by going to chiplets.

Suds took care to define a chiplet in such a way as to exclude some of the integrations from the past, often called multi-chip modules or MCMs. His definition of a chiplet-based design is that it has:

Multiple bare die within a single package
The die are interoperable to deliver functionality
The die are not just pre-existing/standalone
Die-to-die (d2d) connectivity using short-reach inter-die PHYs with an emphasis on power/density:
- <0.1 pJ/bit on-die
- <0.5 pJbit parallel d2d
- ~2 pJ/bit serial d2d
- ~10 pJ/bit package-to-package

Note that there are "taxes" for disaggregation. Suds calls the SSB for chiplets compared to PPA for SoCs. The disaggregation taxes are:

D2d power and area
Packaging and yield costs, plus more time in shipment
Partitioning overheads, design refactoring
Physical considerations
- Floorplanning/symmetry
- Co-planarity/warpage
- Thermal and hot spots
- Noise
- Reliability/longevity

However, there are also some secondary benefits apart from yield and cost:

Bring memory dies closer, such as LDDR5 in the package
Bring logic die closer so lower energy and smaller board footprint
Product diversity (mix and match)

Advanced Packaging Technology (APT)

Advanced packaging technology

Cisco Gen 1 and 1.5 Chiplets

Cisco custom DRAM Ethernet speeds are much greater than DRAM speeds. But routers deal with long routes, bursty flows, and congested links. So there is a requirement for deep buffering, and this means DRAM (SRAM capacity is inadequate). Cisco's solution was "specialty" chiplets. So it built its own DRAM chiplet (this was before HBM was available):

This was Cisco's Gen1 use of chiplets about ten years ago. It was used in multiple systems and is still in production. The DRAM dies was a full-custom in-house design, tapeout, and qual.

Cisco did a Gen 1.5 design with logic die + two DRAM die (HMC or HBM). The biggest challenge was heat leakage from ASIC to DRAM (DRAMs hate to be hot), so this required custom heat sinks.

Cisco Gen 2 and Gen 2.5 Chiplets

The second generation was true 2.5D chiplets on a silicon interposer with two logic die. There were two versions of the design, one with two die for datacenters, and one with just one die for enterprise blade servers. It used semicustom modified HBM PHY to HBI that they co-evolved with the ASIC vendor.

gen 2 chiplets The Gen 2.5 version was "2.1D" with an RDL routing interposer, with two logic die and two DRAM die. Again it created two versions, one with dual-die for modular systems (which could not be done with two separate chips for space and power reasons). A single-die version was used for fixed systems where one or more chips could be used.

Gen 2.5 chiplets

Cisco Gen 3 Study

As cloud datacenters grew explosively, internal traffic (in the datacenter) exploded too, requiring terabit ASICs. Datacenters can hold a maximum of about 100K servers, so if you need more, you need multiple buildings nearby (but not too close, if one catches fire you want the others to still be fine). This means there is a lot of regional traffic clashed together with coherent long-distance optical links.

This was another 2.1D RDL interposer, using either one hub die and 8 PPU die, or "0.5X hub" plus 4 PPU die. This was all connected up with proprietary D2D IO.

Silicon Photonics

Co-packaged optics

3D chiplets shrink cost and volume. Suds said that the photo above was from the prior generation, and he was not authorized to show the current version.

The Hype and the Hope

A graph representing power growth

Infrastructure silicon means large chips, and chiplets increase transistors per dollar. But it is much more important to reduce power than the cost since "connectivity costs electricity." Chiplets help with cost scaling but power is the first-order constraint.

Based on Cisco's experience, Suds said:

Chiplets are unlikely to become the next industry vehicle for small IP, even if we do standardize D2D IO. It is the tyranny of space, rectilinear packing is hard.

Large digital IP (PCIe cards say) might be possible, but we need to standardize power, cooling, and mechanicals across foundries and packaging houses.

Research

Suds ended up with some pleas for future research/work:

All designers, eliminate waste
ASIC and system designers, what about serial memory?
SRAM and DRAM designers, how about more flexible ECC?
Circuit and logic designers, better flops (they are big today and still susceptible to SEEs)
Process and EDA engineers: how about Bostonian routing (allowing 45° routes, not just horizontal and vertical)

As Robert Dennard said (who, by the way, invented the DRAM as well as Dennard scaling):

Yes, there's an end to scaling. But there is no end to creativity.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.