• Skip to main content
  • Skip to search
  • Skip to footer
Cadence Home
  • This search text may be transcribed, used, stored, or accessed by our third-party service providers per our Cookie Policy and Privacy Policy.

  1. Blogs
  2. Breakfast Bytes
  3. Intel's Cascade Lake: Deep Learning, Spectre/Meltdown, Storage…
Paul McLellan
Paul McLellan

Community Member

Blog Activity
Options
  • Subscribe by email
  • More
  • Cancel
Intel
meltdown
cascade lake
deep learning
storage class memory
optane
Spectre
3dxpoint

Intel's Cascade Lake: Deep Learning, Spectre/Meltdown, Storage Class Memory

14 Sep 2018 • 4 minute read

 breakfast bytes logo At the recent HOT CHIPS in Cupertino, Sujal Vora of Intel gave a look inside the Future Intel Xeon Scalable Processor (Codename: Cascade Lake-SP). To make sure we all stayed to the end, this was the last presentation in the conference.

The three focus areas for this processor, in addition to various enhancements, are:

  • Special instructions for deep learning inference
  • Support for storage class memory (Optane in Intel-speak, 3DXpoint in everyone-else-speak)
  • Side channel mitigations (Spectre and Meltdown)

Tick-Tock

 The above table shows where Cascade Lake fits in. Haswell was a new microarchitecture (tick) in 22nm, the first FinFET process, back when Intel was still calling it Trigate. This then became Broadwell when it was moved to 14nm (tock). Skylake was the next new microarchitecture, still in 14nm (tick). Normally, you would expect this microarchitecture to be moved to 10nm (tock) but Intel's 10nm is late and Cascade Lake is also in 14nm (although I'm assuming in what Intel has called 14nm++ in other contexts, Sujal just said "process tuning"). Apart from the three focus areas, the core is very similar to the first-generation Xeon Scalable Platform, with the same core count, cache size, and I/O speeds.

Neural Networks

New is VNNI, the Vector Neural Network Instruction. There is a new 8-bit instruction, VPDPBUSD, that fuses 3 instructions in the inner convolutional loop using 8-bit data and operates on all 3 in parallel. A similar instruction, VPDPWSSD, fuses two instruction working on 16-bit data, and does two operations in parallel.

 The above table shows the number of elements processed per cycle. There is no change for 32-bit floating point, but 16-bit go from 64 to 128 elements per cycle, and 8-bit from 85.33 (no, I don't know where that comes from either) to 256 elements per cycle.

Storage Class Memory

3DXpoint memory, which is close to DRAM speed and close to NAND capacity, is the first so-called storage class memory. So far it is the only one, although see my post Carbon Nanotube Memory: Too Good to Be True? about Nantero's technology, also presented at HOT CHIPS. 3DXpoint was co-developed by Intel and Micron, and the Intel version goes under the name Optane.

The biggest challenge in the server memory hierarchy is the huge gap in performance between DRAM and SSD. DRAM is too expensive for truly huge data-intensive applications, but SSD performance is too slow. Between the two technologies, SSD latency is about 1000x that of DRAM, and the bandwidth is just 1/10 as big. On the other hand, DRAM is about 40 times more expensive. You can have fast, or you can have cheap, but not both.

Until now, maybe. Storage class memory slips into the gap: big, affordable, persistent, DDR4 compatible.

 However, to take advantage of storage class memory requires changes to the processor and to the operating system. With DRAM, if the power fails, the data will be lost, so there is not a lot of point in the processor taking a lot of care to note whether a write to memory completed or not before the power went out. That all changes with storage class memory. It is critical to know which writes completed, because when the system restarts, those values will be in memory. The writes that didn't complete will be lost along with the rest of the processor state. There are also changes required to the operating system and to other persistence-aware applications. When they restart, they can access data in memory without having to re-read it from disk. Because of this capability, less needs to be written out to disk in the first place, since it can be recovered from the storage class memory after a reboot. Adding a layer of fast persistent memory to the hierarchy changes a lot. However, care needs to be taken to manage what is in cache, and what is in the persistence domain, which requires some special instructions to flush data and wait for writes to complete.

Defense Against Side-Channel Attacks

The last focus area is hardware mitigations for side-channel methods. Intel seems to go out of its way to avoid mentioning the words Spectre and Meltdown, but that is what this is all about. The basic message is that Cascade Lake should provide higher performance mitigations over software-only mitigations.

 The table above is pretty much all the detail given. In particular, "mitigation" can mean anything from "that approach is completely shut down" to "we catch the easy cases but a determined adversary can still get through."

Summary

The last slide of HOT CHIPS summarized Cascade Lake.

 

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.