Science News headlines, in your inboxHeadlines and summaries of the latest Science News articles, delivered to your email inbox every Thursday.
Subscribe to Science NewsGet great science journalism, from the most trusted source, delivered to your doorstep.
Plodding progressProteins start out as long chains of amino acids and fold into a host of curlicues and other 3-D shapes. Some resemble the tight corkscrew ringlets of a 1980s perm or the pleats of an accordion. Others could be mistaken for a child’s spiraling scribbles. A protein’s architecture is more than just aesthetics; it can determine how that protein functions. For instance, proteins called enzymes need a pocket where they can capture small molecules and carry out chemical reactions. And proteins that work in a protein complex, two or more proteins interacting like parts of a machine, need the right shapes to snap into formation with their partners. Knowing the folds, coils and loops of a protein’s shape may help scientists decipher how, for example, a mutation alters that shape to cause disease. That knowledge could also help researchers make better vaccines and drugs. For years, scientists have bombarded protein crystals with X-rays, flash frozen cells and examined them under highpowered electron microscopes, and used other methods to discover the secrets of protein shapes. Such experimental methods take “a lot of personnel time, a lot of effort and a lot of money. So it’s been slow,” says Tamir Gonen, a membrane biophysicist and Howard Hughes Medical Institute investigator at the David Geffen School of Medicine at UCLA. Such meticulous and expensive experimental work has uncovered the 3-D structures of more than 194,000 proteins, their data files stored in the Protein Data Bank, supported by a consortium of research organizations. But the accelerating pace at which geneticists are deciphering the DNA instructions for making proteins has far outstripped structural biologists’ ability to keep up, says systems biologist Nazim Bouatta of Harvard Medical School. “The question for structural biologists was, how do we close the gap?” he says. For many researchers, the dream has been to have computer programs that could examine the DNA of a gene and predict how the protein it encodes would fold into a 3-D shape.
Here comes AlphaFoldOver many decades, scientists made progress toward that AI goal. But “until two years ago, we were really a long way from anything like a good solution,” says John Moult, a computational biologist at the University of Maryland’s Rockville campus. Moult is one of the organizers of a competition: the Critical Assessment of protein Structure Prediction, or CASP. Organizers give competitors a set of proteins for their algorithms to fold and compare the machines’ predictions against experimentally determined structures. Most AIs failed to get close to the actual shapes of the proteins. Then in 2020, showed up in a big way, predicting the structures of 90 percent of test proteins with high accuracy, including two-thirds with accuracy rivaling experimental methods. Deciphering the structure of single proteins had been the core of the CASP competition since its inception in 1994. With AlphaFold’s performance, “suddenly, that was essentially done,” Moult says. Since AlphaFold’s 2021 release, more than half a million scientists have accessed its database, Hassabis said in the news briefing. Some researchers, for example, have used AlphaFold’s predictions to help them get closer to completing a massive biological puzzle: the nuclear pore complex. Nuclear pores are key portals that allow molecules in and out of cell nuclei. Without the pores, cells wouldn’t work properly. Each pore is huge, relatively speaking, composed of about 1,000 pieces of 30 or so different proteins. Researchers had previously managed to place about 30 percent of the pieces in the puzzle.
That puzzle is now , after combining AlphaFold predictions with experimental techniques to understand how the pieces fit together, researchers reported in the June 10 Science.Now that AlphaFold has pretty much solved how to fold single proteins, this year CASP organizers are asking teams to work on the next challenges: Predict the structures of RNA molecules and model how proteins interact with each other and with other molecules. For those sorts of tasks, Moult says, deep-learning AI methods “look promising but have not yet delivered the goods.”
Where AI falls shortBeing able to model protein interactions would be a big advantage because most proteins don’t operate in isolation. They work with other proteins or other molecules in cells. But AlphaFold’s accuracy at predicting how the shapes of two proteins might change when the proteins interact are “nowhere near” that of its spot-on projections for a slew of single proteins, says Forman-Kay, the University of Toronto protein biophysicist. That’s something AlphaFold’s creators acknowledge too. The AI trained to fold proteins by examining the contours of known structures. And many fewer multiprotein complexes than single proteins have been solved experimentally.
Forman-Kay studies proteins that refuse to be confined to any particular shape. These intrinsically disordered proteins are typically as floppy as wet noodles (SN: 2/9/13, p. 26). Some will fold into defined forms when they interact with other proteins or molecules. And they can fold into new shapes when paired with different proteins or molecules to do various jobs.AlphaFold’s reach a high confidence level for about 60 percent of wiggly proteins that Forman-Kay and colleagues examined, the team reported in a preliminary study posted in February at bioRxiv.org. Often the program depicts the shapeshifters as long corkscrews called alpha helices. Forman-Kay’s group compared AlphaFold’s predictions for three disordered proteins with experimental data. The structure that the AI assigned to a protein called alpha-synuclein resembles the shape that the protein takes when it interacts with lipids, the team found. But that’s not the way the protein looks all the time. For another protein, called eukaryotic translation initiation factor 4E-binding protein 2, AlphaFold predicted a mishmash of the protein’s two shapes when working with two different partners. That Frankenstein structure, which doesn’t exist in actual organisms, could mislead researchers about how the protein works, Forman-Kay and colleagues say. AlphaFold may also be a little too rigid in its predictions. A static “structure doesn’t tell you everything about how a protein works,” says Jane Dyson, a structural biologist at the Scripps Research Institute in La Jolla, Calif. Even single proteins with generally well-defined structures aren’t frozen in space. Enzymes, for example, undergo small shape changes when shepherding chemical reactions. If you ask AlphaFold to predict the structure of an enzyme, it will show a fixed image that may closely resemble what scientists have determined by X-ray crystallography, Dyson says. “But [it will] not show you any of the subtleties that are changing as the different partners” interact with the enzyme. “The dynamics are what Mr. AlphaFold can’t give you,” Dyson says.
A revolution in the makingThe computer renderings do give biologists a head start on solving problems such as how a drug might interact with a protein. But scientists should remember one thing: “These are models,” not experimentally deciphered structures, says Gonen, at UCLA. He uses AlphaFold’s protein predictions to help make sense of experimental data, but he worries that researchers will accept the AI’s predictions as gospel. If that happens, “the risk is that it will become harder and harder and harder to justify why you need to solve an experimental structure.” That could lead to reduced funding, talent and other resources for the types of experiments needed to check the computer’s work and forge new ground, he says. Harvard Medical School’s Bouatta is more optimistic. He thinks that researchers probably don’t need to invest experimental resources in the types of proteins that AlphaFold does a good job of predicting, which should help structural biologists triage where to put their time and money. “There are proteins for which AlphaFold is still struggling,” Bouatta agrees. Researchers should spend their capital there, he says. “Maybe if we generate more [experimental] data for those challenging proteins, we could use them for retraining another AI system” that could make even better predictions. He and colleagues have already reverse engineered AlphaFold to make a version called OpenFold that researchers can train to solve other problems, such as those gnarly but important protein complexes.
Massive amounts of DNA generated by the Human Genome Project have made a wide range of biological discoveries possible and opened up new fields of research (SN: 2/12/22, p. 22). Having structural information on 200 million proteins could be similarly revolutionary, Bouatta says.In the future, thanks to AlphaFold and its AI kin, he says, “we don’t even know what sorts of questions we might be asking.”