The program found all 200 million proteins known to science: how is this possible?

The program found all 200 million proteins known to science: how is this possible?

The researchers built a database of 200 million protein structures, which they did with the AlphaFold program, which DeepMind developed in 2018 and released in July 2021. The Open Source program predicts a three-dimensional protein structure based on the sequence of its amino acids, the building blocks of which are proteins. The protein structure dictates its functions, so the database identified by AlphaFold will help identify the new protein function that humans can use.

Paradox proteins

Squirrels are building blocks of life. They are produced by different organisms, ranging from bacteria to plants and animals, and when they are formed, they form in milliseconds. They are formed from chains of amino acids that are folded into complex shapes, their three-dimensional structure determines their function to a large extent. It is worth learning how a protein is formed, you can understand how it works and how it changes its behavior.

Although DNA provides instructions to create a chain of amino acids, predicting how they interact to form a three-dimensional form has been very difficult. Until recently, scientists have deciphered only a portion of the 200 million proteins known to science. The problem is that their structure is so complex that it is almost impossible to guess what form they will take.

Cyrus Levinal, an American molecular biologist, wrote in a 1969 article about a paradox: despite the vast number of possible configurations, proteins fold quickly and accurately, and each protein can take from 10 to 300 possible final forms.

So Levinal wrote that if someone tried to find the right form of the protein by trying each configuration one by one, it would take more time than the universe exists.

Efforts by scientists

Scientists have ways of visualizing proteins and analysing their structure, but this is too slow and difficult work. According to Nature magazine, X-ray crystallography is the most common way to project proteins. In this way, X-rays point to hard protein crystals and measure how they break down. The goal is to determine how proteins are made. According to DeepMind, this experimental work has shaped about 190,000 proteins.

New method

In November 2020, DeepMind, an artificial intelligence group, announced the development of a program called AlphaFold, which can predict this information quickly using an algorithm. Since then, it has studied the genetic codes of every organism whose genome has been sequenced and predicts the structures of hundreds of millions of proteins that they together contain.

AlphaFold works by building knowledge of amino acid sequences and interactions, trying to interpret protein structures. As a result, the algorithm has learned to predict protein shapes in minutes with precision of atoms.

Last year, DeepMind published 20 species in an open database of protein structure, including almost all 20,000 human-expressed proteins, and now has completed the work and released the predicted structures for more than 200 million proteins.

How do they use technology?

Researchers are already using the fruits of AlphaFold's work. According to The Guardian, the program allowed scientists to give a final description of the key protein of a malaria parasite that was not subject to X-ray crystallography. This will eventually improve the vaccine against disease.

Researcher Wilde Leypart, of the Norwegian University of Natural Sciences, used AlphaFold to identify the structure of Vitellogenin, a reproductive and immune protein that is produced by all the egg-forming animals. The discovery will help to develop new ways of protecting, for example, honeybees and fish from disease. This is important because these animals are important for the food of mankind.

The program also informs about the search for new pharmaceuticals, says Rosana Capeller, Director General of ROME Therapeutics, in DeepMind's statement. "AlphaFold's speed and accuracy accelerate the development of drugs. We are just beginning to understand its impact on pharmaceutical development," she concluded.

AlphaFold models are also used by scientists from the Portsmouth University Enzyme Innovation Center to identify enzymes from the natural world that can be adapted for plastic processing.