IEDM: Embedded Memories

28 Jan 2019 • 6 minute read

On the Sunday of IEDM are two short courses, one memory-focused, and one logic-focused. I always attend the logic one since that is more relevant to the broad semiconductor industry and to EDA in particular. But SoCs mostly (all?) have memories on them too, so even the logic course has a session on memory. This year, Eric Wang of TSMC presented Embedded Memory: Present Status, Architecture, and Technology for Emerging Applications. His presentation is nearly 70 slides long, so I'm not going to pretend to tell you everything he said. I will keep to a fairly high level, for people who use embedded memories. If your job is to design embedded memories, I think you need to do more research than "reading a blog on the internet".

The Mainstream

The above table is a sort of taxonomy of memories, non-volatile in blue, and volatile in yellow. I'm going to assume you know what all the acronyms mean. After all, I probably can't tell you anything interesting about the future of DRAM if you are starting from a point where you don't know what DRAM is. I should also emphasize that we are talking about embedded memories, not standalone memories. This means that they need to be built on a standard CMOS logic process, perhaps with some extra masks. We are not going to build a 3D NAND structure as an approach to embedded flash, for example.

Today, the main workhorse memory is SRAM, which is high-performance and can support a wide range of supply voltages. It is also compatible with the logic process, not requiring additional process steps. It is also compatible in the sense that it benefits from scaling, improving in performance and density along with the standard cells. For specific applications, eDRAM and eFlash can be used, but they require additional masks/processing.

The trend is to higher bandwidth and capacity, driven by AI (and other HPC applications) in cloud/server, and by graphics/image-processing in client/mobile.

SRAM

Eric went over the challenges for SRAM cell design:

there is a shrinking read/write window at low VMIN
quantized device size In SRAM cells (due to FinFET transistors being a fixed size, not due to quantum effects) affect the read/write margins
external circuit assist techniques are required to get around this

The next challenge is high interconnect resistance (everywhere, of course, but here we're talking about in SRAM). At 28nm a 256b M0 bitline had a resistance of under 50Ω, but at 7nm it is over 1000Ω, so maybe 25-30 times as much. Repeaters can be added, but they impact density. One solution is to double up the wordline, running it on two metal layers. Adding a flying bitline too can improve array access time by about 40%.

eDRAM

Embedded DRAM (eDRAM) is attractive since it bridges the latency, capacity, and bandwidth gaps between on-chip SRAM and off-chip DRAM. There are two primary architectures: deep trench capacitor, and stacked capacitor. The deep trench capacitor is embedded below the transistor, whereas the stacked capacitor is incorporated into the BEOL interconnect. Both obviously require several additional masks and process steps, so the attractiveness of the two alternatives depends largely on cost.

eNVM

Embedded non-volatile memory (eNVM) means embedded flash (eFlash). This is important, associated with embedded MCUs, which are widely used in automotive and IoT devices. It is attractive (compared to off-chip flash) due to lower BOM cost and smaller form-factor. Being on-chip, it has lower latency and power, so is attractive for holding executable code. Plus the code is not directly accessible from off the chip, which is important for security. Since eFlash is writable, it can be used for over-the-air updates of the code.

From a manufacturing point of view, eFlash is compatible with the logic process, and supports the high temperatures required for automotive, aerospace, and some other applications. The table above compares eFlash to off-chip NAND flash for automotive applications.

There are three mainstream eFlash cells, as in the above table. The floating gate approaches have challenges scaling, with both the oxide thickness, and also the cell area getting smaller (less charge being used to hold the data bit). There are big problems scaling eFlash beyond 28nm and beyond, due to process complexity, reliability, and cost. It is challenging to integrate eFlash with HKMG, let alone FinFET. New eNVMs have advantages versus eFlash in advanced nodes.

OTP Memory

One-time-programmable memory (OTP) is a memory where the bits can be programmed once, at or soon after manufacturing. There are two main techniques, metal fuses (which are like the old fuses in house wiring, the interconnect is blown by passing a high current through it to program) and antifuse (where the high voltage is used to punch a hole through the gate oxide to short out the gate). The ant-fuse approach has much smaller area, but requires a higher voltage for programming. Another security advantage is that metal fuse can visibly be inspected to extract the code, whereas the antifuse approach produces no visible difference between bits programmed as 1 or 0. Neither of these two approaches requires additional masks: the fuse appoach uses normal metal, the antifuse approach uses the normal gate-oxide.

Memory for Deep Learning

Eric talked about the memory wall for deep learning. For more details on this, see my post Neural Nets Hit the Roofline—Memory for AI. The fundamental problem with memory in an AI context is that data access dominates the computation energy and latency. It doesn't matter how fast your computational core is if the memory is the bottleneck. The pie charts to the right show two typical computations, with the blue segment being computation and the yellow memory energy. Or, as I said in my concluding sentence in the post liked to above:

if you are involved in AI systems, don't overdesign the compute at the expense of memory bandwidth—you will be disappointed.

Two solutions are to put computation in the memory chips to improve energy efficiency (in-memory computing) or at least to put the memory in the same package (near-memory computing). Emerging memory technologies such as MRAM and RRAM are attractive for deep learning (see the next section) They are both non-volatile with low standby leakage, but they add process complexity and cost.

Emerging Memory Technologies

Replacing eFlash is a high bar. It requires 10-year data retention, greater than 1M cycles of endurance, and less than 10ns access time.

The options are MRAM (magnetic RAM), RRAM (resistive RAM), and PCRAM (phase-change RAM). The table above summarizes the three technologies. They are all faster and cheaper than eFlash. One big plus is that they are built in the BEOL in the interconnect, and so sit over the CMOS logic transistors. There are also other tradeoffs that can be made within a technology, such as reducing endurance for a bigger read window (endurance of 1000X is fine for code storage, for example), or relaxing speed/power for higher endurance.

Some of the technologies are potentially attractive for novel AI approaches. RRAM in particular can be used to store an analog value (as a resistance) not just a 0 or 1. Potentially this means the weights of a neural net could be held in analog form, with obvious savings in area (your brain stores weights in analog form). This is not a simple change, since it requires the development of RRAM in a way that handle the resistance in the face of process variability and temperature, as well as developing neural net circuitry that can use the analog weights. Let's say it is an idea with potential.

Summary

SRAM continues to be the workhorse memory in the modern SoC but faces increasing challenges to scale power, performance, and area in advanced nodes.

eFlash enables MCU for low-power IoT and automotive, but its scaling faces major challenges at 28nm and beyond.

New AI and HPC applications demand efficient memory access to reduce system power and latency.

Emerging memory technologies and architectures are in development to address the memory wall for future applications...but not really ready yet.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.