Just as ChatGPT generates text by predicting the word most likely to follow in a sequence, a new artificial intelligence The (AI) model can write new proteins that do not occur naturally from scratch.
The scientists used the new model, ESM3, to create a new fluorescent protein that shares only 58% of its sequence with naturally occurring fluorescent proteins, they said in a study published July 2 preprint. The bioRxiv database. Representatives from EvolutionaryScale, a company formed by former Meta researchers, also outlined the details on June 25 in a STATEMENT.
The research team has released a small version of the model under a non-commercial license and will make the large version of the model available to commercial researchers. According to EvolutionaryScale, the technology could be useful in areas ranging from drug discovery to designing new chemicals for degrading plastics.
ESM3 is a large language model (LLM) similar to OpenAI’s GPT-4 that powers the ChatGPT chatbot, and the scientists trained their largest version on 2.78 billion proteins. For each protein, they extracted information about sequence (the order of the amino acid building blocks that make up the protein), structure (the folded three-dimensional shape of the protein), and function (what the protein does). They randomly masked out pieces of information about these proteins and asked ESM3 to predict the missing pieces.
They grew this model out of research the same team was doing while still at Meta. In 2022 they announced EMSFold – a precursor to ESM3 that predicted unknown structures of microbial proteins. That year, of Alphabet DeepMind also predicted protein structures for 200 million proteins.
Connected: DeepMind’s AI program AlphaFold3 can predict the structure of every protein in the universe – and show how they work
The scientists then pointed out that there is limitations to the predictions of these AI models and that protein predictions need to be verified. But the methods can still massively speed up the search for protein structures, because the alternative is to use X-rays to determine protein structures one by one—which is slow and expensive.
However, ESM3 goes beyond the single prediction of existing proteins. Using information gleaned from 771 billion unique pieces of structure, function and sequence information, the model can generate new proteins with distinct functions. It was described as a “ChatGPT moment for biology” by one of EvolutionaryScale’s supporters.
In the new study, the researchers used the model to generate a new fluorescent protein – a type of protein that captures light and emits it at a longer wavelength, causing it to glow a new shade of green. . These proteins are important to biological researchers, who attach them to the molecules they are interested in studying to track and image them; their discovery and development won a Nobel Prize in Chemistry in 2008.
The model generated 96 proteins with sequences and structures that can produce fluorescence. The researchers then selected the one with the fewest sequences in common with naturally occurring fluorescent proteins. Although this protein was 50 times less bright than natural green fluorescent proteins, ESM3 generated another repeat that led to new sequences that increased the brightness – and the result was a green fluorescent protein unlike any found in nature, called ” esmGPF”. These iterations, done in moments by AI, would take 500 million years of evolution to achieve, the EvolutionaryScale team estimated.
“Right now, we still lack a fundamental understanding of how proteins, especially ‘new to science’ ones, behave when introduced into a living system, but this is an exciting new step that allows us to approach synthetic biology in a new way. AI Modeling like ESM3 will enable the discovery of new proteins that the constraints of natural selection would never allow, creating innovations in protein engineering that evolution cannot be exciting. it doesn’t take into account the multiple stages of natural selection that create the diversity of life we ​​know AI-driven protein engineering is intriguing, but we can’t help but feel we might be overconfident in assuming we can outrun the processes of complicated. worked out by millions of years of natural selection.”
#ChatGPT #moment #biology #ExMeta #scientists #develop #model #creates #proteins #nature
Image Source : www.livescience.com