Jumping Jack Flash

3 Nov 2020 • 8 minute read

This is the second post about non-volatile memory technologies. The first post was EPROM: Chips with Windows. Today we move to flash memory. This was originally invented by Toshiba in 1980 as a derivative of the EEPROM technology discussed in the first post and commercialized in 1987. The biggest difference is that flash memories are manufactured in much larger sizes. Since the erase function in EEPROM/flash is slow, the larger memory is divided into pages and the memory can be erased a page at a time.

Flash memories are very widely used. You almost certainly have gigabytes of it in your pocket, since it is what holds all your photographs and music in your smartphone. Solid State Drives (SSD) used for storage in laptops and data centers are built from flash. Despite flash being relatively slow to write, it is still much faster than hard disk drives (HDD), often called "rotating media" just to be really clear we're talking about actual disks.

Like the older EEPROM technology, flash memory has a limited number of erase cycles. However, this is largely hidden from the user by having a sophisticated flash controller that does "wear leveling", keeping track of how many times each page has been erased, and having a map between where each page in the memory space is actually stored in the memory. Otherwise, for example, the SSD block holding the root of the filesystem might run out of cycles long before anything else on the drive was close to wearout. Usually a single controller is used to handle many flash memory chips, rather than embedding a controller on every chip.

I'm only going to talk about so-called NAND flash. The other type, NOR flash, has been largely (completely?) superseded by NAND, and especially all the most modern flash memories, built using 3D technology, are NAND flash. NOR flash had low endurance and was slow, but it had some niche advantages due to being able to erase a single word rather than a whole page. It is also more expensive to manufacture, especially now that NAND flash is manufactured using 3D technology (see later in this post for more on that). NAND flash is the underlying technology for all those USB drives and also for SDcards and similar memory cards.

The sizes of flash memories increased over time, driven by several factors:

Moore's Law and semiconductor scaling
Stacking die with through-silicon vias (TSVs)
Multi-Level Cell technology (MLC), storing more than one bit on each floating gate
3D manufacturing

For the first decades of flash, just like everything else in semiconductor, the main driver was scaling of lithography and manufacturing to a smaller dimension. However, there was an earlier limit to this than with SoCs since the size of the floating gate couldn't get too small or it would only hold too few electrons to detect. Unlike for DRAM, there wasn't an escape by going deep down into the wafer.

Another way of scaling was to thin the flash die and stack some number (say 16) of them on top of the flash controller, with TSVs used to connect up the stack. Although this is 3D packaging technology, this is not the same as 3D NAND (see below).

At the same time, multi-level cell technology was developed, with each memory element holding 2, 3, or 4 bits. This is done by controlling how many electrons are injected onto the floating gate, and then being able to detect the different amounts of charge with the sense-amplifiers.

3D NAND

The biggest development, though, was 3D NAND. Once scaling reached its limit, the only way to continue to increase the number of bits per die was to go vertical. Multiple layers of memory elements are built up on the wafer. This is limited by a couple of things. First, the more layers, the lower the yield. But eventually, a plasma etch needs to be done through the entire stack. This has an aspect ratio of something like 80:1. If the half-mile high Burj Khalifa, the tallest building in the world, had an 80:1 aspect ratio, then the ground floor would be 33 feet square. The image above (from Coventor, which used to be partially owned by Cadence). But the state of the art is now up to 96 layers or perhaps even more.

eFlash

SoCs needed non-volatile memory, too. Until the last few years, the technology of choice was embedded flash or eFlash.

If the memory was just to hold a bootstrap or other data that was programmed at manufacture, then one-time-programmable (OTP) could be used. This came in two main types. One, fuse technology, where thin metal fuses formed the programming element. These would literally be blown like a fuse with a high current. One downside of this approach, apart from the fact that the bits were fairly large, was that they were not very secure. You could see which fuses were blown and so read out the code. The other technology would use a high voltage to punch a hole in the gate oxide under a transistor. This could be sensed by the electronics, since the insulator no longer insulated perfectly. However, the damage was tiny and under the gate and was impossible to reverse engineer.

Around 20nm, eFlash ran into the same scaling limit as standalone flash: the floating gate could no longer be scaled. However, using 3D NAND on an SoC was not economically feasible since it added too many layers and too much cycle time. I'm not sure if the processes would be compatible anyway—memory fabs typically have very different processes from logic fabs, with different economics. Typically, SoCs wanted small amounts of eFlash, otherwise it would be more economical to use a separate 3D NAND chip (or die).

Emerging Memories

There are three memory technologies that have been in development for a decade or more. In fact, one or more of these was expected to take over as the workhorse and replace DRAM, but that has not happened. Two of the three technologies are:

MRAM for Magnetic RAM or Magnetoresistive RAM
RRAM or ReRAM for resistive RAM

Both of these technologies are constructed in the metal stack (in the BEOL in manufacturing terminology, the back end of line). This makes them easy to add to any process, in particular to an SoC.

As you might guess, MRAM does not store the data bits using charge but rather a magnetic element. There are actually two different approaches. The first is known as Magnetic Tunnel Junction (MTJ). The bit is two magnetic elements separated by a thin insulating barrier layer. One of the elements is not switched and is permanently magnetized. The other has its polarity in one of two directions, depending on whether a 1 or a 0 was written. The memory is written in a manner somewhat like old magnetic ferrite core memories of my youth. It is read since the magnetic field affects the inductance of the lines underneath, which can be detected.

The other MRAM approach, the one the foundries seem to use for used for SoCs, and built entirely in the interconnect stack, is STT-MRAM. STT stands for Spin-Torque-Technology. This uses a spin-polarized current (instead of a mixture of electrons with both types of spin) and some of the energy from that spin can be used to polarize the magnetic layer. Confusingly, due to the name, this actually changes the resistance of the stack, which is how the bit is read. I confess, I don't entirely understand all this. However, this is the technology that seems to have been adopted by all the foundries.

RRAM works by containing a memristor, an element of which the resistance can be changed in a way that is preserved. As a basic memory technology, I've heard it described as "disappointing". One place that shows promise is that the resistance is an analog value. So it is possible to store intermediate values in an RRAM, in particular using a single bit to hold a weight for a normal network (which might be 4 bits, although there is inevitable inaccuracy). So this can be used to do some level of in-memory compute.

3DXpoint

3DXpoint is a phase change memory technology developed by Micron and Intel. Intel markets products in this space under the name Optane. Note that 3DXpoint is nothing to do with the flash business that Intel is in the process of selling to SK Hynix—Intel is keeping Optane and trying to make it a key part of their data center strategy.

Phase change memories work by using heating to change an element from crystalline to amorphous (or back again), which can then be sensed to read.

3DXpoint is higher capacity than DRAM and lower cost (per bit), and faster than flash (but slower than DRAM). It thus has the potential to become a new level in the storage hierarchy between DRAM and flash-based SSDs. However, there is a lot of complexity to changing the storage hierarchy, requiring software and operating system support to be able to reuse the preserved data after a power failure or reboot.

To see some of the complexity of this, see my post from January's persistent memory summit Persistent Memory: We Have Cleared the Tower.

As I said in that post, given its characteristic, 3DXpoint can be used in three ways:

You can use persistent memory to add more memory, existing applications want it, and nobody needs to change anything. But then the product has to compete with DRAM on price (per bit, not per chip).
Or you can use persistent memory and take advantage of all its features, but then existing operating systems and applications need to change to know about it.
There is a halfway point where the operating systems are persistent-memory-aware but applications are oblivious. But that only gets halfway benefits.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.