Get email delivery of the Cadence blog featured here
Primes are really important in encryption. In fact, the foundation of internet security depends on a couple of things that seem somewhat impossible could both be true at once:
In some sense, this is amazing. We can decide that a number is prime, without trying to factorize it. In fact, without any way to factorize it even if we wanted to. We're talking about large numbers here—think hundreds of digits—since obviously for smaller numbers we can just try all the possibilities.
This post will get pretty mathematical, so if you take nothing else away, remember this. The main algorithm used for internet security is called RSA. It relies on finding two large prime numbers (think 100 digits) and multiplying them together (to get a 200 digit number). There is no way given the 200 digit number to discover the 100 digit numbers that we multiplied together (and trying all possibilities will take longer than the age of the universe).
I recommend watching this video that explains it in more detail...but uses the fact that it is easy to mix paint and impossible to work out the colors that were mixed. The first 2 minutes of the video does no math at all, just paint.
When you go to most websites today, you will see a little padlock to the left of the URL above the main window in your browser. This indicates that it is a secure encrypted connection. The encryption is done using a symmetric algorithm since these don't require a huge amount of computation. A symmetric algorithm means that the key used for decoding and encoding is the same. However, it suffers from the problem of all symmetric algorithms: how do you get the key from the side that originates it (say your browser) to the other side (the website) without that transmission being open to interception. If the bad guy gets the key you are using, she (traditionally she is called Eve, for "eavesdropper") can read all the communications between your browser and the website, say your bank, which is obviously not good.
The key transmission is handled using an asymmetric approach, known as public key cryptography, which uses two keys, a public key that doesn't need to be kept secret, and a private key which is...well, private. To communicate with the website, the browser does some stuff with the public key, sends a message, and only the website with the private key can decode it. This algorithm is computationally expensive, so it is not used for transmitting all the real data, it is just used for the initial key exchange. For more details on public key cryptography, start with the video above.
The algorithm used is known as RSA (from the names of the inventors) and relies on something that seems impossible: if you take two very large prime numbers (say 100 or more digits) and multiply them together, then there is no known algorithm to take that big number (now perhaps 200 digits) and find the original two factors. Because of our experience with small numbers, it is surprising that this is a hard problem. After all, if I give you a number like 51 and ask you to factorize it, it doesn't take a lot of thought to realize it is 3×17. But even for relatively small numbers, it gets hard fast. For example, 1961 is 53×37 but it would take quite a bit of time to discover that.
If I asked you to find all the primes less than, say, 500 (and you didn't write a computer program) then you'd probably do one of two things. The first is to write down all the numbers, then strike out all the even numbers (if you even bothered to write them), then all the multiples of 3, then all the multiples of 5, and so on. This is a very old algorithm known as the Sieve of Eratosthenes. The other approach would be to take each number in turn and see if it has factors. You can get a little smarter than this since it is obvious that any number ending in an even number or 5 is not a prime. Also, you only need to try numbers up to the square root of the number you are testing, since if a number has a big factor then it also has a small one (so 51 is divisible by 17, but we only needed to test up to the (rounded-up) square root, 8, since it must have a small factor too, which it does, namely 3).
Computers don't find primes like that. How do they find them?
There are several very old algorithms that have the characteristic that if a number fails the test then it is not prime (for sure), and if it passes the test it is probably prime. "Probably", meaning that there are very few numbers that pass the test and are not prime. For some purposes, probably is good enough. But we can do better. If we use two or more of these algorithms, and they all say the number is prime, then it is even more likely. Two such algorithms are called the Fermat (yes, he of the last theorem, pictured to the right) probable prime test, and the Lucas probable prime test. There are no known numbers that pass both tests and are not prime. However, nobody has proved this.
The Fermat probably prime tenet relies on something called Fermat's Little Theorem. His Big Theorem is also his Last Theorem, that was finally proved by Andrew Wiles in 1994. Fermat's little theorem says that if p is prime, then for any integer a, the number ap−a is an integer multiple of p. If we try lots of values for a and they all pass this test, then p is probably prime. If it fails for any, then it is certainly not prime. Numbers that pass the test but are not prime are known as Fermat Pseudoprimes.
You probably know the Fibonacci series 1,1,2,3,5,8,13... where each number is the sum of the previous two, and the initial two numbers are set to 1 and 1. There is a similar series called the Lucas series where the initial numbers are set to 2 and 1 (some versions say 1 and 3, which produces the same series with one extra element at the start). These are named after Édouard Lucas (yes, another Frenchman, pictured to the left). The series goes 2,1,3,4,7,11,18... and can be used as a test for primality. The test is that if p is prime, then the p'th Lucas number less 1 is divisible by p (or in more mathematical terms, the p'th Lucas number is congruent to 1 mod p). The converse is almost true, namely that if n is not prime (composite), then one less than the n'th Lucas number is not divisible by n. "Almost" here meaning, as above, that very few composite numbers pass the test, which are known as Lucas pseudo-primes.
As I said above, there are no known numbers that are both Fermat Pseudoprimes and Lucas Pseudoprimes, so if a number passes both tests it is as sure as we can be prime. Various other tests can be added to make this even more certain, and there are lots of ways to optimize doing the testing faster than I explained above.
For really, really, large primes, a different approach is used. In fact, all the really large (the largest known has over 23 million digits) primes are Mersenne Primes, of the form 2p-1. It is straightforward to show that if the exponent p is not a prime, then this won't deliver a prime. Unfortunately, the converse is not true. Not all numbers of that form are prime even when p is prime. In fact, for large p, most of them are not. It requires a lot of searching to find the ones that are.
GIMPS is the Great Internet Mersenne Prime Search. Like SETI@home which searches for extraterrestrial radio signals using the idle time on thousands of individual's computers, GIMPS looks for Mersenne Primes using the idle time on a huge network of computers who get parceled out parts of the overall search. I'll tell you the algorithm they use. Note that this approach only works for Mersenne primes, not any old number.
We can then use the tests I described above to find whether the number we are considering is "probably" prime. That's good enough for cryptography but "probably" is not good enough for mathematics.
To do better, we need to use another series, the Lucas-Lehrner series (originally developed in 1856, this is not a computer thing). This series is defined as starting with 4, and otherwise being the square of the previous number minus 2. This gets large really fast, pretty much doubling the number of digits every time. The first few elements are 4,14,194,37634,1416317954... We are going to use this series as part of the primality test, but it turns out we don't need to calculate the precise values, which get explosively large very fast, we just need the remainder of the elements after dividing by our trial number 2p-1 (this is known as arithmetic modulo 2p-1). Let's call this the modified Lucas-Lehmer series. Then the primality test is that 2p-1 is prime if and only if (mathematicians usually write this 'iff') the (p-1)st element of the modified Lucas-Lehmer series is 0 (which is the same thing as saying that the (p-1)st element of the original Lucas-Lehmer series is divisible by 2p-1).
Note that one feature of the GIMPS approach to searching for Mersenne primes, where the calculations are distributed, is that sometimes a Mersenne Prime is found and all the smaller numbers have not yet been tested, so it is not known which Mersenne Prime it is (the 47th? the 48th?). This came up just this year since 2p-1 was discovered to be prime when p=43112609, at the time the 45th known Mersenne Prime, but it was only in April that all the smaller numbers were tested (two of which were prime) so that this is officially the 47th Mersenne Prime.
You can join the Mersenne Prime Search from the GIMPS website.
I said above that there is no way to factorize very large numbers. In fact, this seems to be true only in the world of the computers we have today, what the quantum people call "classical" computers (in an analogy to classical physics, not involving quantum mechanics or, perhaps, relativity). It seems that this will not be true if we can build sufficiently big (in the number of qubits) quantum computers. For now, we can't, so internet security is still secure. But there is a lot of work in progress on encryption algorithms that are quantum-resistant, that will still be secure even when we have high-qubit quantum computers.
But don't be surprised if, one day, you wake up to hear the news that "internet security is broken."
Sign up for Sunday Brunch, the weekly Breakfast Bytes email.