Revolutionizing protein design: AI generates novel sequences

Designing de novo proteins holds immense potential for achieving superior combinations of novel functions and mechanical properties, thereby advancing biological and engineering applications. However, testing the vast number of probable amino acid sequences, in addition to the experimental costs associated with designing novel proteins with targeted structural properties or features, remains a challenge.

In a recent study published in the journal Chem, researchers utilize attention-based diffusion models to efficiently generate novel protein sequences with prescribed secondary structures.

Study: Generative design of de novo proteins based on secondary-structure constraints using an attention-based diffusion model. Image Credit: PopTika / Shutterstock.com

About the study

In the present study, researchers discuss two generative deep-learning models that predict amino acid sequences and generate folded three-dimensional (3D) structures of proteins based on design constraints of secondary structures through the per-residue structure or overall content.

The team focused on the mechanical properties of proteins for the analysis and mapping between primary amino acid sequences and secondary protein structures. The models considered conditioning descriptions as inputs to produce amino acid sequences through conditional diffusion based on attention.

The AlphaFold and OmegaFold methods were used to generate 3D protein structures. Two models were trained using the Protein Data Bank (PDB) dataset.

Model A received fractional inputs of the proteinaceous secondary structures, whereas Model B considered per-residue data of the secondary structures as inputs to construct 3D protein models and predict amino acid sequences of proteins. The models were capable of producing samples to further narrow down sequences by selecting the best-fit samples that satisfied the conditioning inputs the most or those that showed the least similarity with known proteins.

The diffusion models used U-Net convolutional neural networks with interlinked transformer and convolutional layering, skip connections, and attention modules to identify noise at every step for subsequent removal.

The de novo proteins were compared with the critical assessment of structure prediction (CASP)-14 and 15 target set proteins by performing the Basic Local Alignment Search Tool (BLAST) analysis to assess protein novelty. The generative models constructed protein sequences from random signals under conditioning by reversing the diffusion process in a step-by-step manner. Eight parameters associated with the secondary structure of proteins were assessed using the Define Secondary Structure of Proteins (DSSP) codes.

For model A, the conditioning vector parameters included α helix, extended parallel and/or anti-parallel β sheet conformation, hydrogen-bonded three, four, or five turns, unstructured parameter, β bridge, 3/310 helix, π helix, and bends.

For model B, five cases with varying secondary structure distributions were considered. These included a predominant β sheet, a long α helix with a breaker in the center, a small α helix, a β sheet sandwiched between two α-helical domains, and a partially disordered-helical protein.

Study findings

The diffusion models were found to efficiently design proteins with secondary structure specifications and de novo amino acid sequences that have not been discovered previously.

The generative models provided robust results, even for imperfect-type inputs and unrealistic designs. As a result, the use of these models has the potential be expanded to generate proteins with other clinically and functionally relevant properties.

The per-residue secondary structure-based model was more accurate and yielded more diverse amino acid sequences, particularly for α-helical structures.

Both models handled variegated design objectives robustly and offered new approaches to discovering superior protein materials and systems. Model A analysis identified several denotive cases, such as those with high β sheet content, a mixture of α-helical and β sheet content, pure α-helical content, significantly disordered α-helices, and completely disordered proteins.

AlphaFold and OmegaFold analysis of the predicted β-strand assembly into higher-order filamentous structures yielded comparable results. The BLAST analysis predicted structures similar to existing amino acid sequences that could be enhanced by increasing conditioning probabilities or adding noise to conditioning vectors during training.

Model B results showed good agreement with the design objectives, thus confirming that the protein generative model could design de novo proteins with geometric specifications and secondary structure localization. Developing models that provide detailed atomic coordinates could improve protein design.

For model B, the BLAST analysis indicated 50% to 60% similarity between existing proteins and the generated proteins. Model B generated proteins more effectively than Model A.

Conclusions

The current study reports two deep-learning models that can predict amino acid sequences and 3D protein structures based on secondary-structure design objectives. These novel models are robust, reliable, and can generate new protein sequences not yet discovered from natural mechanisms or systems.

The models generated protein sequences with desired secondary structure conformations. These data could be integrated to obtain a protein sequence using model A, whereas model B could be used to refine the sequence by specifying the residue-level detail of the secondary structures.

The models not only seek to respect the conditional inputs but also yield to the underlying constraints of physically possible secondary structures learned during training. This approach has the potential to accelerate the design of new proteins for use in medicine, industry, and other bioengineering applications.

Further research must include additional conditioning, explore functional properties of the generated proteins for various properties beyond structural objectives, such as biological activity, and improve sequence diversity from those of existing proteins.

Journal reference:
  • Ni, B., Kaplan, D. L., & Buehler, M. J. (2023). Generative design of de novo proteins based on secondary-structure constraints using an attention-based diffusion model. Chem. doi:10.1016/j.chempr.2023.03.02
Pooja Toshniwal Paharia

Written by

Pooja Toshniwal Paharia

Pooja Toshniwal Paharia is an oral and maxillofacial physician and radiologist based in Pune, India. Her academic background is in Oral Medicine and Radiology. She has extensive experience in research and evidence-based clinical-radiological diagnosis and management of oral lesions and conditions and associated maxillofacial disorders.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Toshniwal Paharia, Pooja Toshniwal Paharia. (2023, April 25). Revolutionizing protein design: AI generates novel sequences. News-Medical. Retrieved on November 21, 2024 from https://www.news-medical.net/news/20230425/Revolutionizing-protein-design-AI-generates-novel-sequences.aspx.

  • MLA

    Toshniwal Paharia, Pooja Toshniwal Paharia. "Revolutionizing protein design: AI generates novel sequences". News-Medical. 21 November 2024. <https://www.news-medical.net/news/20230425/Revolutionizing-protein-design-AI-generates-novel-sequences.aspx>.

  • Chicago

    Toshniwal Paharia, Pooja Toshniwal Paharia. "Revolutionizing protein design: AI generates novel sequences". News-Medical. https://www.news-medical.net/news/20230425/Revolutionizing-protein-design-AI-generates-novel-sequences.aspx. (accessed November 21, 2024).

  • Harvard

    Toshniwal Paharia, Pooja Toshniwal Paharia. 2023. Revolutionizing protein design: AI generates novel sequences. News-Medical, viewed 21 November 2024, https://www.news-medical.net/news/20230425/Revolutionizing-protein-design-AI-generates-novel-sequences.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Targeted degradation of Pin1 shows promise for pancreatic cancer treatment