After Meltdown and Spectre

14 May 2019 • 7 minute read

At the recent Linley Spring Microprocessor Conference, the second day's keynote was by Jon Masters of Red Hat. He wears two hats (both of them red) since he is responsible both for their Arm product line and also their response to the Spectre and Meltdown vulnerabilities. And he's no stranger to Breakfast Bytes, having appeared several times before:

Arm hat:

How ARM Servers Can Take Over the World
Red Hat's Mr. ARM Talks Open Source

Vulnerability hat:

Spectre with a Red Hat
Spectre/Meltdown and What It Means for Future Design

After Meltdown and Spectre

I am assuming here that you already read my post How Do Out-of-Order Processors Work Anyway? or that you already know what speculative execution is.

The original, and most famous, of the vulnerabilities were Meltdown and Spectre. They were discovered in mid-2017, but only announced to the public in the first week of January last year. During that period between discovery and announcement, processor companies and companies like Red Hat were working out responses to them. They have since been joined by TLBleed and Foreshadow. In fact, there are many variants. Spectre and Meltdown set the standard that your security vulnerability just isn't hot enough if it doesn't have its own logo.

The amazing thing about these vulnerabilities is that they affect pretty much all high-end microprocessors that have both speculative execution and branch prediction—which is to say, all of them. These are not obscure silicon bugs being exploited, these are the silicon performing perfectly to spec. They are so-called "side-channel attacks" that rely on something other than the primary behavior. The vulnerabilities differ in details, and those details are way beyond a blog post, but they all work roughly like this:

Make sure the cache is cold and doesn't contain cached values of any of the special locations you will use later.
Do something to train the branch predictor to ensure your code will be executed even though the condition to cause it to be executed will actually fail.
Load a single 8-bit byte of something you have no access to, like security keys.
Use the 8-bit byte to access one of 256 words of memory depending on its value.
Wait for the speculative execution to notice your code should never have run in the first place and undo it.
Use timing checks to find out which of the 256 words of memory is now in the cache.
Success: you know the value of a byte you should not have been able to read.
Repeat for the next byte.

You might wonder why, at step 3, the processor doesn't trigger an exception. But most processors don't report loads that are not permitted until the instruction is no longer speculative. It's just too complicated to speculatively handle exceptions (mostly).

Responses

I won't cover all the variants since there are a lot, and just as there are various flavors of vulnerabilities, there are various flavors of response, too.

Meltdown

Meltdown is actually even simpler than the above description since it doesn't require branch predictor training. It relies on the fact that the entire Linux operating system address space is part of every userspace (just marked as inaccessible) so that entering the operating system doesn't require an address space switch, which has a performance penalty. The mitigation is simply to stop doing that. Leave the trampolines for entering the operating system in userspace, but then require an address space switch once the operating system proper has to be entered.

Spectre v1

Spectre relies on finding a "gadget", an existing snippet of code to run in an unmodified existing program binary, that does the middle part of the sequence of operations above and loads secrets. Then branch predictor training (sometimes called poisoning in this case) is used to cause the gadget code to run.

Spectre v1 is known as "bounds check bypass". Generally, if you check that an array offset is in bounds, then the out of bounds part of the array will not be accessed. But with speculative execution, it may well be accessed, even though the value will eventually be discarded. And note that "out of bounds" can mean any address in memory, not just a couple of addresses off the end of the array.

This is one of the worst challenges to mitigate since existing hardware lacks the capability to limit speculation in this instance. Instead, the software has to be altered to prevent the speculative load. Most architectures contain a special instruction, often called a fence, to stop speculation. This is painful though, requiring source and binary files to be scanned to find offending sequences, and then alter them.

Spectre v2

Spectre v2 relies on training indirect branch prediction, where the branch address is not in the code but in a register. Branch predictors only use some of the low-order bits of the address (12-20 bits), and don't distinguish between addresses in different processes, or the operating system versus userspace. Of course, this reduces the accuracy of the prediction, but by a trivial amount, and it is a lot cheaper to store 12 bits than 64 for every entry in the branch-target-buffer (BTB).

Mitigation will be easy in future cores, by adding tags to the BTB. But cloud providers already have millions of existing cores. Full mitigation is probably impossible there. Some cores, with microcode, can be fixed by altering the instruction behavior slightly, but this can be expensive and might add thousands of cycles every time the kernel is entered from userspace.

Google came up with a software solution using "return trampolines" or "retpolines". Since indirect branches/calls are the problem, don't use them, instead set up a fake stack with the address, and then "return" to it instead of doing the indirect call.

NetSpectre

This is a Spectre attack performed over the network using two combined gadgets, a "leak" gadget set some flag or state during speculative out of bounds access, then a "transmit" gadget uses the flag in a way that is remotely observable. The attacker trains the leak gadget and then extracts the data with the transmit gadget. As you might guess, the rate at which data can be leaked is fairly slow, bits per hour, but if those are really important bits like a security key, it's still serious.

Summary

Solutions in existing processors fall into a number of categories:

Operating system solutions: Separate kernel and userspace, alter entry and exit sequences to the kernel, etc.
Software solutions: Scan the binary of the operating system or user code and look for the code sequences that cause these problems and then alter them (e.g., by adding fences, or switching indirect calls to use retpolines).
Microcode solutions: Alter the microcode in a microcoded processor to mitigate the issue.
Chicken bits: Modern processors have a lot of flags (maybe thousands) to control internal operation (in particular to disable broken features and to handle silicon bringup). Certain operations can be disabled with these bits.

Some of these vulnerabilities will be relatively easy and cheap to fix in future processors now that the basic ideas are well understood, such as adding additional tag bits to BTBs, or preventing speculative reads of system registers.

Future architectures might also bring new problems. This is an race between processor designers trying to push for even more performance, and security researchers trying to find ways to use new features to find new side-channels. So there will be new vulnerabilities.

Here is one future potential problem that Jon mentioned: value prediction. In this, the value loaded into a register is predicted speculatively, and later when the calculation is complete, either the predicted value is correct, and everything is retired as normal, or it is wrong, and treated similarly to a mispredicted branch. It is not yet common, but some designs already "predict zero" under some circumstances. This alters the timing, and so potentially an attacker can find a way to notice the change in timing.

Spectre-v1 is one of the hardest problems to solve. There are incomplete solutions such as limiting speculation in critical code, and not keeping secrets around for longer than necessary, but there is no fully general solution that can simply be added to future processors. Remember, the reason we do speculative execution in the first place is that every fifth instruction is a branch. Without speculative execution of instructions beyond conditional branches, the performance of a modern microprocessor would be about 5% of what we get today. This is clearly not going to happen.

Of course, no processor designer is going to design any future processor without considering these issues: they are firmly on everyone's radar. Probably we need a fundamental re-think of security versus performance and "architecture 2.0".

But also, as Jon put it in his last slides:

Changes as to how we design software are required. All self-respecting software engineers should have some knowledge of how processors really behave, just as a professional race car driver is expected to know a lot about the machine. We need to have no more "hardware" and "software" people as "us" and "them".

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.