Deep Blue, AlphaGo, and AlphaZero

3 Apr 2018 • 6 minute read

gary kasparov vs deep blue There are three major events in computers learning to play board games at a very high level:

In 1997, Deep Blue defeated world chess champion Gary Kasparov
In 2017, AlphaGo defeated #1 world Go champion Ke Jie
In 2017, AlphaZero defeated Stockfish, then the world leader in computer chess

You have probably heard a lot about the first two. But the third is probably the most important. But let's start with Kasparov's defeat.

Chess

In 1997, IBM's computer Deep Blue beat Gary Kasparov, the world champion at the time. It was the first defeat of a world champion by a computer under tournament conditions. In some ways, it wasn't that Deep Blue cracked some secret about how to play chess, just that it was the world's most powerful computer at the time, and if you throw enough computer power at a combinatorial problem then you can use simple algorithms. It was perceived as a victory for brute force rather than a breakthrough in artificial intelligence. If you want to know more, there is a video Game Over: Kasparov and the Machine. As a sign of just how fast technology was advancing, this was the second matchup of Kasparov and Deep Blue. There had been one the year before, in 1996, which Kasparov had won.

It is worth pointing out that chess has been played for a long time. There has been a huge amount of analysis of chess, especially openings and endgames, which are well understood. Deep Blue had all that built in, of course. There are also thousands (millions?) of high-level chess games available for analysis and training. There are also chess computers. So a young person can leverage all this and can train against chess computers that are rated higher than any human player. As a result, chess grandmasters are getting younger, and Bobby Fischer's 15½-year-old record, which stood for 33 years, has since been beaten by dozens of young players. Now, instead of having to spend a lot of time in book learning studying great games, openings and so on, young players can just play against chess programs on a PC that already embody all that prior knowledge. Even so, it takes nearly 10 years from first learning the basics to getting to a grandmaster title. For example, Magnus Carlsen, the current world champion, became a grandmaster at just 13½, having been introduced to chess at the age of 5, and playing his first tournament at the age of 8.

Go

go board Having defeated the world champion, there wasn't a lot more that could be done in computer chess (although we'll come back to chess later), and the man versus machine arena moved to Go. This is considered a more difficult game for a computer. Indeed, at the time of Kasparov's defeat, there were people saying that Go was too complex for the foreseeable future. Go has simpler rules than chess, but many more moves at each point in the game, and is considered more intuitive and less analytical than chess.

Traditionally, computer games use a technique called minimax. The basic idea is to look at all the moves you can make, followed by all the moves the opponent can make, then all the responses you have. This explodes combinatorially so is not practical for games like chess or Go. The name comes from the fact that you want to pick the move that maximizes the minimum position you end up with, that is it guarantees that you end up in the best possible place even if your opponent plays perfectly. But it requires an algorithm that can assess how good a given position is. This is somewhat of a black art since a good position is one you can win from.

It reminds me of a lot of EDA algorithms. For example, the definition of a good placement is one that can be routed well—but routing is far too expensive to run as part of placement so other heuristics need to be used. One approach to pruning the tree, known as alpha-beta pruning, is to cut out branches of the tree when you have already found better alternatives. But that assumes that the evaluation function is accurate and doesn't miss moves that superficially look poor but in a few moves time lead to good outcomes.

Chess and Go are both full of moves like that, so the naive approach won't work. So under the hood of AlphaGo are two neural networks, one for assessing the value of a position, and one for deciding which moves to continue to analyze and which to prune. These are called the value network and the policy network. AlphaGo was trained on a database of around 30 million moves, initially with the goal of getting it to make the same moves as humans would make. Then it was further trained by reinforcement learning by playing itself.

AlphaZero

Alpha Zero (and a fore-runner, AlphaGo Zero) is a different version of the program. The "zero" in the name indicates that it starts from nothing. It gets given the rules of the game. It can play chess, Go, or a sort of Japanese chess called Shogi. The training is entirely done without any knowledge of previous games, human play, tactics, or strategy. The implementation has a lot of power, using 5,000 first-generation Google Tensor Processing Units (TPUs) to generate games for analysis, and 64 second-generation TPUs to train the neural networks.

There is some debate as to whether the version of Stockfish that AlphaZero beat was optimal, running with 64 threads and a gigabyte of hash. But that is a wrinkle. AlphaZero beat Stockfish, the best computer chess program in the world, after just nine hours of training.

Not Just Stockfish—Its Programmers

This is amazing in so many ways. I said above that a human with access to the best computer chess programs in the world, and access to centuries of analysis, takes eight or ten years to get to grandmaster standard. AlphaZero did it from a standing start in less than a day. Yes, chess is an artificial environment and this is not general intelligence. But it is so far beyond human performance as to be hard to believe. The nearest any human has come to trying something similar can be read in A Chess Novice Challenged Magnus Carlsen. He Had One Month to Train, which ended as you would expect. In fact, since Stockfish can beat human players already, AlphaZero didn't just, in effect, beat a novice who trained for a month, it beat all the grandmasters who trained for decades.

More significant, perhaps, is that AlphaZero didn't just beat Stockfish. It beat Stockfish's programmers. That is, in a few hours it beat the years and years of work that has gone into making computer chess as good as it can be, and better than humans. Everything that has ever been done in chess forever, in both human chess and computer implementations, was no match for half-a-day training on AlphaZero's (admittedly extreme) hardware.

What's next? Maybe given the rules of arithmetic, a few hours later it will discover differential and integral calculus, matrix algebra, group theory, non-Euclidean geometry, and all the other facets of advanced mathematics that is the accumulated knowledge of millions of people over hundreds of years.

Of course, the basic approach only works in a closed domain with some simple rules where a lot of contact with the real world is not required. Discovering how DNA works, or astronomical dark matter, or the relationships of subatomic particles, require observations, not just running AI algorithms against themselves. But the whole world is actually built up from simple building blocks. Cells, molecules, atoms, particles, quarks.

Even if we don't understand the details, we know that, by definition, consciousness and intelligence in humans are emergent behaviors of atoms and chemistry (or just physics if you go lower still).

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.