As the COVID-19 pandemic rages across the world, killing hundreds of thousands and infecting over 4.1 million, researchers are racing against time to understand the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The SARS-CoV-2 genome has now been sequenced in many different regions of the world, and researchers are collaborating by sharing information to better understand the make-up and mutations that the virus has undergone.
Now, a new study looked at all the shared sequencing archives (NCBI and GISAID) to investigate 5,349 whole SARS-CoV-2 genomes, looking for evidence of strain diversification and selective pressure. The study, titled, “Controlling the SARS-CoV-2 outbreak, insights from large scale whole genome sequences generated across the world,” is released on the pre-publication site biorxiv*.
What was the study about?
The researchers write that the SARS CoV-2 most likely arose from bat coronavirus and since then has infected humans and undergone several mutations. The infection affected humans since December 2019 and has been seen only in wild varieties before that among animals such as bats and pangolins. They wrote that over the past two decades, there had been three significant spillovers of viruses from animals to humans. COVID-19 pandemic is the most recent one.
Antibodies attacking SARS-CoV-2 virus, the conceptual 3D illustration for COVID-19 treatment. Image Credit: Kateryna Kon / Shutterstock
This news article was a review of a preliminary scientific report that had not undergone peer-review at the time of publication. Since its initial publication, the scientific report has now been peer reviewed and accepted for publication in a Scientific Journal. Links to the preliminary and peer-reviewed reports are available in the Sources section at the bottom of this article. View Sources
Molecular diagnostics to determine the cause of the infection was the cornerstone in detecting cases. Using diagnostic tools, not only could physicians demarcate infected persons but also contain and isolate them and start therapy. Genetic studies like this one will also help in the development of an effective and safe vaccine against the infection. Genomic sequencing studies have been in place from the start of the pandemic, and the team of researchers wrote that these analyses have shown that there have been “relatively few evolutionary selection pressures.”
However, there remains speculation that with the spread of the infection into a diverse range of human populations across the world, there is an increasing possibility of genetic variations of the viral RNA. They also explained that since the virus has less scope to travel between the continents due to restrictions on social movements, there is less chance of the virus to intermix, and it has thus formed distinct viral variants or strains. This may have implications in the development of pharmacotherapies and diagnostic tools as well as vaccines, they wrote.
What was done?
Researchers across the world have archived their genomic sequencing data on the SARS CoV-2 in the NCBI and GISAID databases. These were accessed by the researchers who looked at 5,349 whole-genome sequences from 62 nations, that were archived to look for changes and alterations. They looked for selective evolutionary pressure for changes in the sequences as well as evidence of diversification of the viral strains.
The team wrote, “The COVID-19 pandemic is a global crisis and control strategies, including the development of PCR-based diagnostics, serological assays, monoclonal antibodies, and vaccines, as well as an increased understanding of transmission dynamics, will be informed by knowledge of SARS-CoV-2 sequence data.”
What was found?
From the genomic sequences, the team unearthed 3,958 SNPs (Single nucleotide polymorphisms or changes in a single base pair in the genomic chain). From these SNPs, they created a phylogenetic tree showing the diversification of the SARS CoV-2. Their results showed that there were “two major clades and six sub-clades” across the world. They wrote, “We report two main clades which are further clustered into six subclades, defined by thirteen informative barcoding mutations. There is overlap between our two main clades (C1 vs. non-C1) and the L and S types defined using only 103 isolates [12], where the two separating or barcoding mutations (in orf1ab and ORF8) are identical.”
The team explains that some of the significant changes that were revealed in the genome were the sequences that coded for the spike glycoprotein. There was a selective evolutionary pressure on these areas of the sequence, they wrote. This spike glycoprotein determined the spikes on the virus that binds to the ACE2 receptors on the respiratory passages of the host. This is how the virus infects humans, the researchers explained. If the gene that coded for this spike protein was altered, its binding capacity to the ACE2 receptors could be altered. They also found that some of the mutations could be such that current molecular diagnostic methods could fail to detect some of the sub-clades of the virus.
Conclusions and implications
Martin Hibberd, professor of emerging infectious diseases and a senior author of the study, said about the spike protein mutations, “This is exactly what we need to look out for. People are making vaccines and other therapies against this spike protein because it seems a very good target. We need to keep an eye on it and make sure that any mutations don’t invalidate any of these approaches.” He added, “This is an early warning. Even if these mutations are not important for vaccines, other mutations might be, and we need to maintain our surveillance, so we are not caught out by deploying a vaccine that only works against some strains.”
The team concludes that genome sequencing across the world has shown the “challenge of developing SARS-CoV-2 containment tools suitable for everyone”. They call for more data sharing and evaluation to accurately estimate the spread of the pandemic and the viral strains. The team has also developed an online tool called the “COVID profiler” so that this type of analysis can be profiled for SARS CoV-2, the team wrote.
This news article was a review of a preliminary scientific report that had not undergone peer-review at the time of publication. Since its initial publication, the scientific report has now been peer reviewed and accepted for publication in a Scientific Journal. Links to the preliminary and peer-reviewed reports are available in the Sources section at the bottom of this article. View Sources
Journal references:
- Preliminary scientific report.
Phelan, J., Deelder, W., Ward, D., Campino, S., Hibberd, M. L., & Clark, T. G. (2020). Controlling the SARS-CoV-2 outbreak, insights from large scale whole genome sequences generated across the world. https://doi.org/10.1101/2020.04.28.066977
- Peer reviewed and published scientific report.
Phelan, Jody, Wouter Deelder, Daniel Ward, Susana Campino, Martin L. Hibberd, and Taane G. Clark. 2022. “COVID-Profiler: A Webserver for the Analysis of SARS-CoV-2 Sequencing Data.” BMC Bioinformatics 23 (1). https://doi.org/10.1186/s12859-022-04632-y. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04632-y
Article Revisions
- Mar 7 2023 - The preprint preliminary research paper that this article was based upon was accepted for publication in a peer-reviewed Scientific Journal. This article was edited accordingly to include a link to the final peer-reviewed paper, now shown in the sources section.