Google's DeepMind's AlphaFold Solves Protein Folding

3 Dec 2020 • 3 minute read

Solving protein folding has been a challenge for at least 50 years. You probably know that proteins are made up of amino acids, of which there are just 20. They are linked into long chains. How proteins behave depends on how they fold up. For example, all enzymes are proteins, as are hormones, and antibodies. So it is really important to understand how they react to understand how the digestive system works, how the immune system works, and so on. The problem is that although they are chains of 50 to 2000 amino acids long, they fold up into more compact shapes. For example, collagen is 1050 amino acids long. The challenge is that although a protein folds up based on lots of organic chemistry and forces between atoms, the way in which this happens has proved intractably hard to automate. It has simply not been possible to take the chain of amino acids and determine by computer what shape the protein will be. The only approaches that sometimes work are things like X-ray crystallography and cryo-electron microscopy. Famously, X-ray crystallography was one of the things that led to the breakthrough of finding the structure of DNA (which is not a protein since it is not built out of amino acids).

Here is a 2-minute video that explains protein folding:

I'm not a molecular biologist, obviously. So I had no idea that there is a biennial competition run since 1994 called CASP. As it says in this Nature Article:

DeepMind’s program, called AlphaFold, outperformed around 100 other teams in a biennial protein-structure prediction challenge called CASP, short for Critical Assessment of Structure Prediction. The results were announced on 30 November, at the start of the conference—held virtually this year—that takes stock of the exercise.

The title of the article is even more dramatic:

‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures.
Google’s deep-learning program for determining the 3D shapes of proteins stands to transform biology, say scientists.

The program has already solved the structure of some proteins where people have failed for over a decade to discover the structure.

DeepMind is the same group within Google that created AlphaGo and AlphaZero. See my posts about them:

In their own blog post over at DeepMind, the team shows just how much things have improved over the last fifteen years:

Apparently, 90% is considered to be the gold standard, equivalent to the best of any alternative methods when they work. AlphaFold 2 is pretty much there.

Here's a couple of paragraphs from that same blog post that gives some idea of how it works:

A folded protein can be thought of as a “spatial graph”, where residues are the nodes and edges connect the residues in close proximity. This graph is important for understanding the physical interactions within proteins, as well as their evolutionary history. For the latest version of AlphaFold, used at CASP14, we created an attention-based neural network system, trained end-to-end, that attempts to interpret the structure of this graph, while reasoning over the implicit graph that it’s building. It uses evolutionarily related sequences, multiple sequence alignment (MSA), and a representation of amino acid residue pairs to refine this graph. By iterating this process, the system develops strong predictions of the underlying physical structure of the protein and is able to determine highly-accurate structures in a matter of days. Additionally, AlphaFold can predict which parts of each predicted protein structure are reliable using an internal confidence measure.

We trained this system on publicly available data consisting of ~170,000 protein structures from the protein data bank together with large databases containing protein sequences of unknown structure. It uses approximately 128 TPUv3 cores (roughly equivalent to ~100-200 GPUs) run over a few weeks, which is a relatively modest amount of compute in the context of most large state-of-the-art models used in machine learning today. As with our CASP13 AlphaFold system, we are preparing a paper on our system to submit to a peer-reviewed journal in due course.

Here is an 8-minute video about the making of this breakthrough:

I'll give the last word to Mohammed AlQuraishi, a computational biologist at Columbia University in New York City:

It’s a breakthrough of the first order, certainly one of the most significant scientific results of my lifetime.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.