Persistent Memory

30 Jan 2019 • 8 minute read

Last week was the latest Persistent Memory Summit. In the semiconductor world, we don't usually use that word, we say non-volatile memory. In practice, this mostly means flash memory (mainly 3D NAND today) and embedded flash memory (eFlash). Both persistent memory and non-volatile memory mean memory that doesn't lose its value when the power is turned off.

When I first learned to program, all memory was persistent since it was ferrite core memory. It was only in the mid-1970s that semiconductor memory became cost-effective. DRAM had been invented in the 1960s by Dennard (yes, he of Dennard scaling). But it was MOSTEK's introduction of 4K DRAM chips that made it economical for the main memory for computers. By the late 1970s, MOSTEK had 85% market share. But until then, ferrite core memory was the technology used. I don't remember using its persistence for anything except retaining the bootstrap code. The normal way you would reboot a computer in that era was to enter the address of the start of the bootstrap on the switches on the front panel (actually, they would almost always be at that value already from the last time), and start executing code there. The bootstrap code would run and load the operating system from a disk drive, or perhaps even paper tape or punched cards. The operating system would leave the bootstrap untouched so that it would still be there for next time. Once semiconductor memory became standard, the bootstrap had to be kept in read-only memory (ROM) since DRAM would lose its value when the power was lost or the computer was shut down (actually, the bootstrap in ROM would typically load a secondary bootstrap which, in turn, would load the operating system).

The Past and Near Past

In his wrapup at the end of the summit, Rob Peglar had some diagrams that summarized the story.

In the past, the memory hierarchy looked something like in the above picture, with two levels in the hierarchy. The CPU operated at 1-10ns per instruction, and DRAM held the program and data values, operating at 70-100ns. There are actually caches in there to compensate for the mismatch between processor and DRAM speeds, but I'm leaving them out to keep things simple. If the value needed was not in memory, since memory isn't big enough to hold the entire file system, or the database, or even all the code, then values would be transferred from a disk drive (HDD, or hard-disk drive) which would take about 10ms (aka 10,000,000ns). So if a value was in memory, it could be accessed in 100ns, if not, it would take 100,000 times as long.

When flash memory became economic in large volumes, the first change to this memory hierarchy was introduced. Instead of using rotating media HDDs, flash memory solid-state drives (SSDs) were introduced. These were still too expensive to use as the only storage medium, so HDDs were still used, especially for so-called cold storage, data which might be required one day but probably will not be.

For specialized applications, NVDIMMs could be used to add non-volatile memory that operated at DRAM speeds (but cost more than plain DRAM). These look like normal DRAM DIMMs to the processor, but they also contain a controller, a flash memory, and a battery. When the power fails, the contents of DRAM are copied into the flash. When it is restored, the contents of DRAM are copied out of the flash, back into the DRAM.

The SSDs were a lot faster than HDDs and so the difference was reduced to 100X. If a value was in memory, it took 100ns to read it, if it was not then it took 100us.

The reason that persistent memory is suddenly a hot topic is that we seem to be on the cusp of the next change in the memory hierarchy of computers, starting with servers. If you had to come up with a dream memory, it would be faster than DRAM, higher capacity than DRAM, cheaper than DRAM, and be non-volatile (persistent). Intel and Micron developed the 3DXpoint technology, which comes close. This has the potential to slot in as a new level in the hierarchy. It is not quite as fast as DRAM, operating at about 500ns per access (vs 100ns). However, it is much higher capacity and cheaper. It is not just the timing that is important. Since it is cheaper, more accesses can be satisfied from the persistent memory, and so fewer accesses need to go to the SSD/HDD part of the hierarchy.

Persistent memory doesn't lose its value when the power fails, and so has the potential for simplifying software. I'll look at the software side of things in a second post, but just point out here that a lot of complexity in database and file-system software is ensuring that data is all written to disk so that if the power fails, nothing gets lost. With persistent memory, even if the data is not written to disk, it will still be in memory, and so, potentially, after the restart, the program can "carry on where it left off." This happens to be an area of special interest to me. Despite working all my career in EDA, my Ph.D. thesis is on distributed file systems, with an emphasis on recovering after failures.

The opening keynote of the summit was by Frank Hady, an Intel fellow. He told the story of why persistent memory is so important, and why "2019 is going to be a big year" for it. Intel calls their 3DXpoint product line "Optane". In some ways, the keynote is a naked commercial for Optane, but there are several other technologies at various stages in the development pipeline (such as MRAM, ReRAM, and others—I'll discuss them in a future post).

Frank Hady

The need for memory hierarchy was first articulated in the 1940s by Von Neumann and his colleagues. They realized that you would love to have effectively infinite main memory, but that would be physically (and economically) impossible. They realized that you would need a hierarchy of memories, each with more capacity and lower cost than the previous level, but that you would pay for that in terms of access speed (if it wasn't slower than you'd simply cut out a level from the hierarchy).

The reason that memory hierarchies (and caches in general) work is locality, or the 90/10 rules, that says that 90% of the accesses go to 10% of the data. However, this means that ideally each layer is 10X the capacity but can be 1/10th of the performance. However, DRAM to NAND is 1000X in both capacity and performance, opening up a gap into which persistent memory fits.

Frank had his own version of the memory hierarchy diagram, showing where there are gaps. The capacity gap is that you'd like to have more DRAM but can't afford it. The storage performance gap is that you'd like SSDs to be faster than they are using NAND flash. The cost-performance gap is that you'd like even bigger SSDs before you have to go to cold storage on HDD and tape.

There are three requirements for technologies to close these gaps:

Capacity: you want a lot of memory for your dollars.
System performance: you want it to be faster than the current options.
System fit: it needs to be compatible with existing systems (DIMMs, disk drive slots, datacenter racks etc), otherwise the barrier to transition is too high.

The above pictures show 3DXpoint, symbolically on the left, and a photomicrograph in the center. The black line across the top of the photo is the bitline, the dots are wordlines (coming out of the screen towards you). The structure in each cell is both the selector material and the memory material (so there is no need to have a separate memory element and selector element in each cell). This means that the memory is dense, limited only by the bit and wordline spacing. It is non-volatile. The chips hold 128Gb.

The first place that Intel has been selling Optane is in the form of faster and higher capacity SSDs that fit into existing slots. That is nice to have, but it is not game-changing. Even if the memory access itself was instantaneous, the memory is accessed as a file and takes 4-10us just to get through the software stack (in the left on the above diagram). What is much more interesting is direct access (DAX) where the processor simply reads and writes the persistent memory directly, as if it were DRAM (the right on the above diagram).

So in the Intel world, you end up with a memory hierarchy like the above, with Intel NAND SSDs (other manufacturers are available!) with 10s of terabytes and access speeds around 100us. Then a layer of Optane SSD, with a few terabytes and an access speed of 10us, fitting into the same space as any other SSD. Main memory is extended with direct access Optane with 100s of GB and access speeds around 500ns, fitting into DIMM slots. Then DRAM, with 10s of GB and 70-100ns access speed.

Today, Intel seems to be shipping Optane-based SSDs in volume. Although Frank had an Optane DIMM (see the above picture), judging from the complaints during the Q&A, most people can't get their hands on them yet. They are sampling to the biggest customers (presumably the cloud vendors). Google has some servers available with persistent memory, but they are not generally available, you need to get special approval. However, the day before the summit, there had been a hackathon where everyone was using Google servers with persistent memory to experiment with applications that could take advantage of it. Later, Neal Christiansen of Microsoft talked about persistent memory support in Windows Server 2019 (apparently released in late 2018) although he didn't say anything about Azure. My prediction is that all of Google, Azure, AWS, and others will provide servers with persistent memory this year.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.