Google DeepMind and Isomorphic Labs reveal AI able to predict large swathes of molecular biology

Lester Cohen—Getty Images for Breakthrough Prize

Alphabet’s Google DeepMind and its sister company Isomorphic Labs have created a new AI model that they say can help predict both the structure and interaction of most molecules involved in biological processes, including proteins, DNA, RNA, and some of chemicals used to create new medicines.

The new model is a potentially giant leap for biological research. The companies are allowing researchers working on non-commerical projects to query the model for free through an internet-based interface.

Isomorphic Labs, which was spun out of Google DeepMind, has also begun using the system internally to speed its efforts to discover new drugs. The company currently has partnerships with Eli Lilly and Novartis aimed at developing multiple drugs, although the specifics of which diseases the companies are targeting has not been revealed.

Proteins are the building blocks of life and their interactions with one another and with other molecules are the mechanism through which life’s processes happen. Being able to predict those interactions more accurately will help researchers advance science. by helping them understand the mechanism behind diseases, and, potentially, how to better treat and cure them.

Called AlphaFold 3, the new AI software represents a major update and expansion of capabilities beyond Google DeepMind’s previous AlphaFold 2 system. Researchers from the companies published a paper on AlphaFold 3 today in the prestigious scientific journal Nature.

Demis Hassabis, who serves as CEO of both Google DeepMind and Isomorphic, described the new model’s interaction predictions as “incredibly important for drug discovery.”

John Jumper, the senior researcher who heads the protein structure team at Google DeepMind, described AlphaFold 3 as “an evolution of AlphaFold 2, but a really big one that opens up new avenues.” He also said he was excited to see what researchers would do with the new model, noting that AlphaFold 2 had already opened up new areas of biological research that he could never have imagined. AlphaFold 2 has been cited more than 20,000 times in other published scientific papers and has been used to work on drugs for malaria, cancer, and many other diseases.

AlphaFold 2 and 3

Debuted in late 2020, AlphaFold 2 solved a grand scientific challenge because it was able to accurately predict the structure of most proteins simply from their DNA sequence. The company later published the system’s predicted structures for all 200 million proteins with known DNA sequences and made them freely available to scientists in a massive database. Prior to this, only about 100,000 proteins had known structural information.

Knowing the shape and structure of a protein is often a key part of understanding how it will function. But proteins do not work in isolation. And AlphaFold 2 was not designed to predict how proteins would interact with one another—although scientists soon found ways to modify AlphaFold 2 to make some of these predictions. Nor could AlphaFold 2 predict protein interactions with other kinds of molecules, such as DNA, RNA, ligands, and ions, that are found inside living things. It also could not predict the interaction of these other molecules with one another. AlphaFold 3 can.

The system is not always accurate, but represents a major leap forward in performance. According to tests conducted by Google DeepMind and Isomorphic, AlphaFold 3 can accurately predict 76% of protein interactions with small molecules, compared to 52% for the previous best predictive software. It can predict 65% of DNA interactions compared to the next leading system, which only achieves 28%. And in protein to protein interactions, it can predict 62% accurately, more than doubling what AlphaFold 2 could do.

Like AlphaFold 2, AlphaFold 3 also includes a confidence score alongside its predictions that give scientists some indication of whether they should trust the system’s output. This reduces the chance that the AI model will experience the sort of "hallucinations"—plausible but inaccurate outputs—that have plagued recent generative AI models.

Jumper said that so far researchers have found these confidence scores to be highly correlated with whether the structural and interaction predictions are accurate. In other words, the system is not likely to be confidently wrong.

There are a few classes of proteins where AlphaFold 3 is still not accurate. These include proteins that scientists consider “intrinsically disordered,” meaning they only assume a particular structure in the presence of another protein or molecule, perhaps changing their shape radically depending on circumstance, according to Max Jaderberg, the chief AI scientist at Isomorphic Labs.

Bioweapons worries

While many, including former Google DeepMind cofounder Mustafa Suleyman, who is now heading up a new consumer AI division at Microsoft, and Dario Amodei, the confounder and CEO of Google DeepMind rival Anthropic, have warned that rapid advances in AI may lead to the proliferation of bioweapons by radically lowering the knowledge barrier to creating deadly pathogens, Jumper said Google DeepMind and Isomorphic had consulted more than 50 experts in biosecurity, bioethics, and AI safety and concluded that the marginal risk AlphaFold 3 might present in terms of bioweapons creation was far outweighed by the system’s potential benefits to science, including advancing human understanding of disease and finding possible treatments.

The two companies are also only allowing access to the model through an internet service that allows outside researchers to prompt the system and receive a prediction, but does not give them access to the model itself or its underlying computer code.

Unlike some efforts to create large language models (LLMs) for biology that can be prompted in natural language to produce a formula for a compound with particular properties, AlphaFold 3 still requires someone to have a fairly good understanding of biology to use it effectively. In addition, any suggested molecular structure it predicts would still need to be produced or isolated in a lab, a process that also requires relatively specialized knowledge.

AlphaFold 3 uses a significantly different AI design than its predecessor AlphaFold 2. While both AI models are based around transformers, a kind of artificial neural network architecture pioneered by Google researchers in 2017, Jumper said the team working on the new system replaced entire “blocks” of the large transformer that powered AlphaFold 2.

AlphaFold 2 relied heavily on evolutionary information about the proteins for which it was trying to predict structures, while AlphaFold 3 leans on this evolutionary signal far less, using it only at the first step of its structure prediction. Instead, the new system devotes the majority of its components to working through the physical shape of the molecules it is making predictions about.

AlphaFold 3 also uses a diffusion model, similar to ones used for popular text-to-image generation models such as OpenAI’s DALL-E 3 or Midjourney, to learn how to puzzle out the precise atomic structures of molecules. Overall, despite covering far more substances than AlphaFold 2, AlphaFold 3 is a simpler design, with fewer separate components, than its predecessor.

This story was originally featured on Fortune.com

Advertisement