Home
  • Products
  • Solutions
  • Support
  • Company

This search text may be transcribed, used, stored, or accessed by our third-party service providers per our Cookie Policy and Privacy Policy.

This search text may be transcribed, used, stored, or accessed by our third-party service providers per our Cookie Policy and Privacy Policy.

  • Products
  • Solutions
  • Support
  • Company
Community Blogs Breakfast Bytes > Paul Kocher: Differential Power Analysis and Spectre
Paul McLellan
Paul McLellan

Community Member

Blog Activity
Options
  • Subscribe by email
  • More
  • Cancel
security
risc-v
Spectre
Paul Kocher
sifive

Paul Kocher: Differential Power Analysis and Spectre

22 Feb 2018 • 11 minute read

 breakfast bytes logopaul kocherPaul Kocher is a legend in security.  A couple of weeks ago SiFive hosted a seminar by Paul. They do these regularly, usually on a small scale, but this one required a large conference room in the San Mateo Marriott across from their offices. Even though it was in the middle of DesignCon, I felt I should drive up to San Mateo to cover it.

spectre paper

Paul is the lead author of the paper on the Spectre vulnerability. When he started to write the paper, he was the only author. However, when he disclosed what he had found to the microprocessor and operating system companies, to allow them to fix what they could before publication, he found out that there were other groups who had discovered similar things and reported them, too. They combined what they had done into a single paper.

DPA

Before discovering Spectre, Paul was already a legend in the world where security and semiconductors meet. In 1995, he founded a company in San Francisco Cryptography Research. The focus of some of their work was on building semiconductor products that were resistant to attack, part of which involved working out how to attack them.

Cryptography Research was also responsible for the BD+ security algorithm used in Blu-ray (I bet if you had to type that, you'd get the punctuation of Blu-ray wrong, just like everyone puts the hyphen in the wrong place in H-1B visa). One problem with digital rights management (DRM) is avoiding the problem of BOBE, which stands for break-once-break-everywhere. If the DRM uses any sort of master key, then finding the master key off one disk also unlocks all other disks, something that BD+ avoided.

But Paul Kocher's big breakthrough was discovering that software algorithms running on real chips had a vulnerability that nobody else had bothered to think of. Instead of attacking the encryption software, attack the chip on which it runs. He did this by measuring the power consumption of the chip, clock cycle by clock cycle. This was known a differential power analysis or DPA. Since nobody designing chips had bothered to think that bad guys might attack the chip directly, they hadn't thought too much about defense mechanism. It was like the early days of operating systems, when people didn't really think that viruses and other attacks would become significant, so didn't worry about them. When PCs were just an isolated computer with little power, there wasn't much reason (or mechanism) to attack. But when PCs were running ATMs, and banking systems, and DRM systems at major movie studios, that balance changed. Paul's paper has been cited 6743 times, one of the most highly cited papers in security.

differential power analysis paper

Unlike so many security breaches that sound theoretically interesting but are implausible as a real-world attack vectors, DPA turned out to be a real vulnerability. I've actually seen this done, and I wrote about it in EDPS Cyber Security Workshop: "Anything Beats Attacking the Crypto Directly. In this demonstration, there was a little single-board computer with a chip that implemented AES encryption. They used the board to run some encryption while they monitored the power consumption, and within about a minute, they had managed to read out the AES key. Needless to say, this should not be possible, and means that they could decrypt anything that this board was encrypting if that was being used for real.

Encryption like this is used on small computer systems for real, though. You probably have several of these little computer systems in your wallet or purse. Plus you probably have one in your phone. I don't mean in the phone itself, I mean in the SIM card. Credit and debit cards with a chip, and SIM cards, are all examples of smart cards, many manufactured by Gemalto in France. It turned out that they were vulnerable to DFA.

(It also led to a great exit. Rambus purchased Cryptography Research and he remained at Rambus for a few years. A  couple of years ago, he moved on. As did Megan Wachs who worked at Cryptography Research and then Rambus, before moving on to SiFive where she is the lead for their Freedom Everywhere platform. So if you were wondering how and why SiFive got Paul to come and present to them, now you know.)

Paul's skill is to approach things like processors with a security mindset. This is different from the mindset of a software or hardware engineer, who tends to know little about security and only think about it when it becomes a problem. I think it was Bruce Schneier, in one of his books, who nicely summed up the security mindset when he talked about getting an ant farm as a kid. This comes with a card that you send off, and they mail you the ants to go in it. Normal people just see a way to get the ants for their farm. With the security mindset, you see this as a way to mail a packet of ants to anybody in the US you choose to annoy.

Spectre

paul kocherSo Paul moved on from Rambus and, having nothing specific to do, he started looking at the processor in his PC and wondering in what sort of way it might be vulnerable. As a result, he discovered the Spectre vulnerability.  As Megan put it in her introduction, "This is the first cool thing he has done since he left Rambus." Like DPA, the vulnerability had been hiding in plain sight for decades. I should emphasize that Spectre is really an attack on the basic approach to building a high-performance microprocessor, not an attack on a specific processor with a bug in it. Since, to a first approximation, all high-performance processors make extensive use of the same three techniques, they are all vulnerable. Those three techniques are cache memories, speculative execution, and branch prediction.

As processors got faster and faster, memories did not. The first step to address was to add cache memories, smaller faster memories. These were used to hold recently accessed memory items, on the basis that if they were recently accessed, they would probably be accessed again soon. 

Another development was out-of-order (OOO) execution. Simple processors execute instructions in the order in which they appear. Obviously, if you increase the clock rate, you can execute instructions faster, but you can do better than that by executing more than one instruction simultaneously. However, this takes care, since you can't execute one instruction if it depends on the result of a previous instruction. The way to handle this efficiently is quite old, dating back to Robert Tomasulo's seminal paper from 1969. This paper invented much of the terminology that is still used today. In a coincidental link to semiconductor design, Lynn Conway (of Mead & Conway fame) worked in this area at IBM (under a different name) and is credited with the invention of generalized dynamic instruction handling.

tomasulo paper

Processors got faster and faster, and could have large numbers of execution units. When a cache miss occurs, and an item of data really does need to be fetched from memory, it can take as long as executing 200 instructions. As Paul put it:

It is toxic to be doing nothing for 200 cycles while you wait for something from memory.

But even with clever scheduling algorithms, there are almost never sequences of 200 instructions that occur without a conditional branch. Since a branch is conditional, until the condition is evaluated, whether the branch will be taken or not is unknown. But with 200 slots of instructions to fill, the processor has to make a guess and run with it. If it guesses right, time has been saved. If it guesses wrong, then nothing was lost compared to stalling, but the instructions which were executed unnecessarily need to be cleaned up.

The process of guessing whether the branch will be taken is known as branch prediction. In its earliest form, the guess "assume every branch does whatever it did last time" works surprisingly well, and only requires storing a single bit per branch. In particular, it works well for the branch at the end of a loop (since typically the loop is traversed many times), and for branches that are checking for rare events (is an array index in bounds or not) where the rare event doesn't happen very often by definition. With more advanced branch predictors, the accuracy reaches 99+%, which also means that the speculative execution can get so far ahead that it is evaluating several iterations of a loop. A nuance, that turns out to be important, is that the branch predictor usually only uses a subset of the instruction address for efficiency. Occasionally, this means it will mispredict since it is mixing up two loops, or even predictions from another process completely. But in normal circumstances, that is benign, and just causes a tiny loss of performance for a big saving in implementation resources on the chip. For example, Intel uses the low-order 22 bits of the address, and AMD uses the low-order 12 bits.

spectre logoTalking of AMD, when Spectre and Meltdown were first announced, AMD said that they were not vulnerable. But, in fact, Paul says, they are.

Spectre relies on two interactions of all this technology. The first (like Meltdown, the other vulnerability) is that speculative execution, even if eventually discarded, can pull items from memory into the cache. If the instructions that are executed are under the control of the bad guy, some aspects of what they did can be derived from seeing what is in the cache (which can be determined by instruction timing). The other is that the branch predictor can be trained. Since the high-order bits of the instruction are ignored, the branch predictor can be trained by making it evaluate a loop many times, and then, in a different process, using that training to ensure that it will execute a chosen piece of code speculatively. Paul calls this code "the gadget". One challenge is that the code needs to already exist to be available to do this. However, his experiments showed that it was not hard to find code already in the address space when he searched the whole space. Unlike return-oriented-programming (ROP), which requires the gadget code to end with a return, any code with the desired instructions will do. When the speculative execution fails and the results are discarded, control goes back to where the erroneous branch to the gadget took place.

Combining all this, Paul was able to search through the Windows DLL to find the instructions he needed, and then also control which location the victim will leak.

Mitigations

spectre mitigations

So what do you do about this? Everything is a band-aid since all mitigations are "super messy".

Intel has the capability to update microcode in their processors. They have been getting some criticism for their attempts, but at least they have that option. Both Intel and Arm have defined instructions like LFENCE that act as speculation barriers. However, just putting that into one loop as an experiment, peformance dropped by 50%, so that is not really practical to put after every conditional branch. But Microsoft Office has 24M conditional branches, so it is not feasible to work out which ones are actually vulnerable.

Looking Forward

There is a big gap between the processor developers and what the software people assume about the hardware, between the architecture and the micro-architecture. The software engineers make certain assumptions, the hardware designers make different ones. Exploits can squeeze into the gap.

There are lots of side-channels besides the ones that Paul discovered that are grouped as Spectre. Computation time depends on how well the caches, speculation, and branch prediction work. There are timing impacts on one thread due to contention with others. There are analog effects more like differential power analysis: power, heat.

The fundamental problem is that platforms are not designed for security. Worse, today's code may be fine but turn out to be insecure on future chips wth even higher performance. Any fix will be incomplete and other attack variants may be able to get around it. Future chips may comply with the architecture but have different side channels. To make it worse, none of this is documented since a lot is considered proprietary: nobody publishes how their branch predictor works, or how the algorithms in the cache (typically for eviction) work. 

spectre implications

Everything is designed for performance, not security. So we have three options, but no Goldilocks option: fast and dangerous (today), safe but slower (limit speculation), punt the problem from hardware to software and make the compiler put LFENCE in front of any branch that might be "exploitable". In a multicore processor, Paul feels there is no reason that one could not be optimized for security rather than performance. So like Arm's big.LITTLE, you would have FAST.secure or something.

Spectre has been a shock across the industry since it is not really fixable. We are not used to hearing about exploits until they have already been fixed. The most urgent places to fix are the kernel, sandboxing systems for things like Javascript, web APIs, database servers, webservers, cryptographic code. But that is a lot. Plus the complexity of the fixes can create new vulnerabilities.

The process we have for fixing software vulnerabilities works well (tell the software supplier, give them an embargo time to fix, then publish). However, the process doesn't work for hardware since, if you told everyone who needs to know, then that is more people than can keep a secret. There are too many parties, none in control, and conflicting agendas. DPA was the first hardware vulnerability that required mitigation across a whole ecosystem. This is the second. It won't be the last.

Paul's final conclusion:

These should have been found 15 years ago, not by me, in my spare time, since I quit my job and was at a loose end. This is going to be a decade-long slog.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.