In a recent study posted to the medRxiv* preprint server, researchers developed a computational pipeline for the early identification of emerging severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants of interest (VOI) by analyzing SARS-CoV-2 genome data and allocating risk scores on the basis of functional and epidemiological parameters.
This news article was a review of a preliminary scientific report that had not undergone peer-review at the time of publication. Since its initial publication, the scientific report has now been peer reviewed and accepted for publication in a Scientific Journal. Links to the preliminary and peer-reviewed reports are available in the Sources section at the bottom of this article. View Sources
Background
The continual emergence of SARS-CoV-2 variants with enhanced immune-evasiveness, transmissibility, and replication warrants the need to monitor the genomic evolution of the virus. Early detection of SARS-CoV-2 VOIs could enable the prioritization of variants for experimental evaluation, risk assessment, and public health optimization against SARS-CoV-2.
About the study
In the present study, researchers developed a computational heuristic framework to rapidly detect novel emerging SARS-CoV-2 VOIs and prioritize them for wet-lab experiments.
Genomic data for each variant mutation were obtained from the global initiative on sharing all influenza data (GISAID), GenBank, and BV-BRC (bacterial and viral bioinformatics resource center) databases. The sequences were processed to identify high-priority VOIs for wet-lab experimentation. Variant prioritization was based on their epidemiological dynamics and their functional characteristics estimated based on the sequence prevalence scores, functional impact scores, and composite scores.
The framework ranked variant constellations (or covariates) for determining the mutational combinations to be evaluated, and the Omicron variant was detected for validating the computational approach. Genomes were aligned pairwise with the reference (Wuhan-Hu-1 strain) genome, and variant constellations were extracted mainly for SARS-CoV-2 S. Variants were categorized into geographic and temporal groups, and variant constellation counts and total isolate counts by date and region were used for computing spatiotemporal epidemiological dynamics viz. the monthly variants’ growth rates and prevalence rates.
Sequence prevalence scores were calculated from GISAID data of November 2021 (Omicron dominance period) for three most recent months for identifying epidemiologic parameters for scoring heuristics component of the pipeline to detect SARS-CoV-2 lineages that may raise concerns. Each country and month combination with >5% sequence prevalence or more than five-fold increase in growth rate from the previous month was assigned score 1. The scores were summed to obtain the final sequence prevalence score for all countries/month combinations.
Functional impact scores (FIS) were derived based on positional overlapping of SARS-CoV-2 S regions and by summing up the sequence features of concern (SFoC) scores. SFoC scores were calculated based on variant impact on replication, immune evasion, or binding to angiotensin-converting enzyme 2 (ACE2) receptors or monoclonal antibodies and variant neutralization by vaccination or previous infection. Composite scores (CS) were calculated by summing up the sequence prevalence scores (SPS) and functional impact scores. Emerging lineage scores were calculated from GISAIDA data between December 2021 and January 2022 by summing up scores of lineages with growth rates >15.
Results
The team identified 75 regions on SARS-CoV-2 S RBD that significantly impacted the binding of ≤4 antibodies and 36 regions with a significant impact on the binding of vaccine or convalescent sera antibodies. Twelve sites with ≥1 mutations exceeding the threshold (>0.1) were identified as indicative of enhanced ACE2 affinity, of which site number 501 was a site of multiple conformational changes in SARS-CoV-2 S RBD binding interactions with ACE2.
Important sites of adaptive immune responses and SARS-CoV-2 tropism were N-terminal domain (NTD) sites 14 to 20, 140 to 158, 245 to 264, site 614 of SARS-CoV-2 S, and sites 671 to 692 of cleavage of furin protein. Epidemiological data for Omicron showed low SPS but considerably high FIS and resultant high CS values. CS could also quantify slight differences in covariates of a single clade. BA.1 was the predominant Omicron lineage in December 2021 and showed the highest emerging lineage score.
By January 2022, Omicron lineages such as BA.1, BA.1.1, and BA.2 evolved with multiple covariates. BA.2 variant constellation was identical to Omicron BA.1 with multiple unique mutational sites. Mutant BA.1 (with R346K mutation) exhibited higher functional impact scores than Omicron BA.1. Contrastingly, many covariates showed sequence prevalence scores as 0, indicative of no significant threat by their growth changes.
Before January 2022, the N440K, G446S, L24-, R346K, A701V, and L452R mutations appeared sporadically, and mutation dynamics plotting showed that G446S and R346K mutations were less prevalent, whereas L24- was concomitantly more prevalent. The finding indicated a fitness advantage for variants containing L24- and could aid in distinguishing between BA.2 and BA.1.
Conclusion
Overall, the study findings highlighted a novel computational spatiotemporal framework for early detection of SARS-CoV-2 variants based on their sequence prevalence, mutation prevalence, and mutational impacts on SARS-CoV-2 functions such as binding with ACE2 receptors. There were a few challenges in framework development, such as ambiguity fluctuations in sequence data during Delta and Omicron variant emergence, accurate data quantification for computation, and analyzing data that is enormous and continually increasing.
This news article was a review of a preliminary scientific report that had not undergone peer-review at the time of publication. Since its initial publication, the scientific report has now been peer reviewed and accepted for publication in a Scientific Journal. Links to the preliminary and peer-reviewed reports are available in the Sources section at the bottom of this article. View Sources
Journal references:
- Preliminary scientific report.
Wallace, Z. et al. (2022) "Early Detection of Emerging SARS-CoV-2 Variants of Interest for Experimental Evaluation". medRxiv. doi: 10.1101/2022.08.08.22278553. https://www.medrxiv.org/content/10.1101/2022.08.08.22278553v1
- Peer reviewed and published scientific report.
Wallace, Zachary S., James Davis, Anna Maria Niewiadomska, Robert D. Olson, Maulik Shukla, Rick Stevens, Yun Zhang, Christian M. Zmasek, and Richard H. Scheuermann. 2022. “Early Detection of Emerging SARS-CoV-2 Variants of Interest for Experimental Evaluation.” Frontiers in Bioinformatics 2 (October). https://doi.org/10.3389/fbinf.2022.1020189. https://www.frontiersin.org/articles/10.3389/fbinf.2022.1020189.
Article Revisions
- May 13 2023 - The preprint preliminary research paper that this article was based upon was accepted for publication in a peer-reviewed Scientific Journal. This article was edited accordingly to include a link to the final peer-reviewed paper, now shown in the sources section.