In a recent article published in the journal Nature, researchers used a classical concept in computational linguistics to design a new algorithm, LinearDesign, which optimized the structural stability and codon usage of messenger ribonucleic acid (mRNA) sequences. For instance, using this algorithm, researchers could optimize mRNA sequences encoding the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike (S) protein and use it in mRNA-based coronavirus disease 2019 (COVID-19) vaccines.
Study: Algorithm for Optimized mRNA Design Improves Stability and Immunogenicity. Image Credit: metamorworks / Shutterstock
Background
All vaccines based on the relatively new mRNA technology suffer from common limitations, such as mRNA instability and rapid degeneration, which, in turn, lead to poor protein expression and, subsequently, compromised immunogenicity and druggability of all mRNA vaccine products. It also critically hinders the storage, distribution, and efficacy of all mRNA vaccines, including COVID-19 and varicella-zoster virus (VZV) vaccines.
Therefore, there is an urgent need for a principled mRNA design algorithm that simultaneously optimizes stability and codon usage of encoding mRNA sequences to improve protein expression. However, it is a tedious task considering the prohibitively large search space; there are ~10632 mRNA sequences to encode 1,273 amino acids of SARS-CoV-2 S glycoprotein due to inherent redundancies in the genetic code. Indeed, this unsurmountable computational challenge has left the vast space of highly stable mRNA designs unexplored.
About the study
In the present study, researchers used a deterministic finite-state automaton (DFA) to formulate a design space for several mRNA candidates and lattice parsing to create the most stable mRNA in the DFA. Likewise, they attempted to find the optimum balance between mRNA stability and codon usage in a weighted DFA.
First, the team identified an mRNA sequence with the lowest minimum free energy (MFE) change in all feasible mRNA sequences, e.g., for mRNAs encoding the SARS-CoV-2 S protein. The standard RNA folding energy model helped the researchers filter MFE structure among all possible secondary mRNA structures of every candidate mRNA sequence, i.e., they applied a kind of minimization-within-a-minimization strategy. They also worked on codon optimality, measured by the Codon Adaptation Index (CAI) and defined as the geometric average of the relative adaptiveness of each codon in an mRNA sequence.
Specifically for the SARS-CoV-2 S protein, the team employed eight mRNA sequences and used the LinearDesign algorithm for seven sequences, i.e., A-G. They distributed these in the low-MFE design space and excluded the first five amino acids when running the algorithm. Though LinearDesign did not address untranslated region (UTR) optimization per se, the mRNA molecules it designed did not interfere much with the structures of widely used UTRs.
Results
On the COVID-19 vaccine, LinearDesign substantially improved mRNA half-life and protein expression using two viral antigens across three critical attributes for vaccine performance: stability, protein translation, and in vivo immunogenicity. Additionally, it dramatically increased antibody titer by up to 128-fold in vivo, compared to the codon-optimization benchmark mRNA sequence H. A COVID-19 mRNA vaccine using benchmark sequence H showed high immunogenicity in two animal models and entered a phase I clinical trial in China.
This principled mRNA design had no chemical modification. Yet, showed high stability, translation efficiency, and immunogenicity. Another advantage is its low manufacturing cost. Furthermore, an mRNA molecule with a lower MFE tended to have more secondary structures, display a more compact shape, and have a small hydrodynamic size. Therefore, it moved faster by electrophoresis. So, the researchers observed that mRNA sequences A–H loaded onto a non-denaturing agarose gel had higher mobility rates despite having similar molecular weights, which correlated with the calculated MFEs for these sequences.
In VZV mRNA design, with a different UTR pair, LinearDesign also showed substantial improvements, suggesting the robustness of LinearDesign in optimizing coding region was independent of UTR pairs. Accordingly, all LinearDesign-generated mRNA sequences with three different UTRs showed stronger in vitro protein expression over all benchmarks, thus, suggesting that coding region design and UTR engineering are complementary approaches that could be combined in future work.
Conclusion
When the corresponding energy model becomes available, the study algorithm could be adapted to modify nucleotides. Though currently, it only considers stability and codon usage due to the generalizability of the lattice representation, in the future, it could help optimize other parameters relevant to mRNA design. More importantly, it is a general method for molecule design that could help design all kinds of therapeutic proteins, including monoclonal antibodies and anti-cancer drugs.