A recent work posted by researchers to the bioRxiv* preprint server investigated the evolutionary aspects of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2).
Globally, more than 300 million cases of coronavirus disease 2019 (COVID-19) and over 5.48 million related deaths have been officially reported so far. The SARS-CoV-2 genome is a positive-sense single-stranded (ss) RNA and is approximately 30 kb long with an estimated variation rate of 1.12 x 10-3 mutations per site/year.
*Important notice: bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.
More than 3.7 million SARS-CoV-2 genomes have been sequenced to date, which has led to the identification of over 0.15 million unique genetic mutations. It is crucial to understand the functional and structural impact of these mutations and identify any correlations with their spread in humans to uncover the mechanism of adaptation of SARS-CoV-2 in the human environment.
The study
In the present study, researchers examined the impact of mutations on the structural thermodynamic stability (structural stability) of SARS-CoV-2 RNA-dependent RNA polymerase (RdRp) through free energy computation. The authors employed variation spatial profiling (VSP) which is a Gaussian process regression (GPR)-based machine learning approach.
Using GPR, which generates spatial covariance (SCV) relationships, the authors constructed fitness landscapes to determine the molecular mechanisms driving the fitness of SARS-CoV-2 RdRp in the 1) early phase I of the COVID-19 pandemic before the emergence of any variant of concern (VOC), 2) phase II of pandemic marked by the emergence of the Alpha variant, and 3) phase III of the pandemic when the Delta variant led to the massive surge in COVID-19 cases globally.
Results
The authors analyzed over 87,468 SARS-CoV-2 genomes deposited to the GISAID database between December 24, 2019, and September 8, 2020 (phase I of the pandemic) to understand the evolution of nsp12 (of RdRp) throughout the pandemic. About 1,569 missense mutations were detected in nsp12, more than half of them were found in just one virus sample and the remaining 699 mutations were present in at least two virus samples which reflect at least one human-to-human transmission for these mutations. The team observed that the majority (57%) of the mutations had a neutral to slight impact on RdRp structural stability and the rest of the mutations had destabilizing (19%), highly destabilizing (19%), and stabilizing impact (4%).
The substitution P323L was the most frequent mutation (in 80% samples in phase I) observed, followed by A97V observed in about 2% of SARS-CoV-2 genome samples. P323L was found to have a neutral impact on structural stability, but it has been shown to stabilize the binding energy between nsp12 and nsp8-1 for critical RdRp activity. A97V, mapped to the NiRAN (N-terminal Nidovirus RdRp-associated nucleotidyl transferase) domain, had an overall destabilizing effect on RdRp’s structural stability.
Further, a GPR-based fitness landscape based on the structural stability of RdRp was constructed using 63 mutations with significant stabilizing (4) or destabilizing (59) impact. The team noted that residues with relatively high GPR fitness scores were centered at one of the Zn+2 binding motifs specifically the mutations (C563F, M629I, L636I. L638F, S647I, and A690D) adjacent to the C487-G642-C645-C646 Zn+2 binding site and indicated that SCV relationships connect Zn+2 binding motif to the nsp12 binding sites for nsp8-1 and the RNA substrate forming a covariant fitness cluster I with significant impact on structural stability.
The most frequent P323L mutation of phase I of the pandemic was also observed in almost all sequences of the SARS-CoV-2 Alpha variant. The fitness cluster I around the aforementioned zinc-binding motif was shown to recur in fitness landscape and structure in the Alpha variant which indicated that features associated with the C487-G642-C645-C646 Zn+2 binding motif are key for SARS-CoV-2 to iteratively evolve to adapt to the human environment. A second fitness cluster (cluster II) was observed within the NiRAN domain with significant structural impact and high GPR fitness scores.
Similar to earlier observations, the P323L mutation was found in > 99.9% Delta variant sequences, and additionally, G671S substitution was observed in >98% sequences making it a second basal mutation. It was revealed that fitness clusters I and II recur in the Delta VOC and a new cluster (cluster III) around the second basal mutation G671S was also discovered.
Conclusions
The present study findings demonstrated that the GPR-based approach can help understand the key evolutionary aspects of viral genomes. The fitness scores based on GPR identified structural covariant fitness clusters that were undetected previously. These clusters were defined by their changes in the thermodynamic stability and residue connections in the transmission of the virus during the pandemic and involved structural adjustments for Zn+2 binding and multi-residue interactions in the NiRAN domain.
In conclusion, the GPR approach applied in this study essentially emphasized the need for computational and experimental surveillance of multiple SCV-based structure-function relationships of RdRp to monitor and predict key events driving SARS-CoV-2 evolution.
*Important notice: bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.